arxiv: 2604.22839 · v1 · submitted 2026-04-21 · 💻 cs.CV · cs.AI

Recognition: unknown

From Skeletons to Pixels: Few-Shot Precise Event Spotting via Representation and Prediction Distillation

Zhong Han Ervin Yeoh , Jiang Kan

Authors on Pith no claims yet

Pith reviewed 2026-05-10 03:07 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords few-shot learningprecise event spottingmultimodal distillationskeleton datavideo event detectionrepresentation learningpseudo-labelingsports analytics

0 comments

The pith

Distilling skeleton data to visual pixels via multimodal methods enables accurate few-shot precise event spotting in sports.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to improve precise event spotting, which identifies exact moments of events in rapid sports actions, by using few labeled examples. It develops distillation techniques that move knowledge from skeleton pose models to visual video models. Adaptive Weight Distillation adjusts the influence of teacher predictions on unlabeled clips, and Annealed Multimodal Distillation for Few-Shot Event Detection gradually transfers representation knowledge using pseudo labels. Tests on limited clips from tennis and figure skating videos show gains over baselines that use only one data type. This approach could make detailed video analysis practical when full annotation is expensive.

Core claim

We study two complementary distillation strategies for few-shot PES: Adaptive Weight Distillation (AWD), a prediction-level method that adaptively weights teacher supervision on unlabeled data, and Annealed Multimodal Distillation for Few-Shot Event Detection (AMD-FED), a representation-level framework that transfers robust skeleton knowledge into visual modalities through annealed pseudo-labeling. Both methods use multimodal distillation to improve generalization under limited supervision. Evaluations on F3Set-Tennis(sub) under few-shot k-clip settings show consistent outperformance of single-modality baselines and prior PES approaches, with AMD-FED further validated on Figure Skating data.

What carries the argument

Annealed Multimodal Distillation for Few-Shot Event Detection (AMD-FED), a representation-level distillation framework that transfers skeleton knowledge to visual modalities using annealed pseudo-labeling.

Load-bearing premise

Skeleton data provides robust, transferable knowledge to visual modalities via distillation and annealed pseudo-labeling, allowing performance gains on the tested datasets to hold more broadly.

What would settle it

If the AMD-FED model fails to surpass a visual-only baseline when evaluated on a new set of sports videos with only k labeled clips, that would indicate the distillation does not provide the claimed advantage.

Figures

Figures reproduced from arXiv: 2604.22839 by Jiang Kan, Zhong Han Ervin Yeoh.

**Figure 1.1.** Figure 1.1: Example of fine-grained event annotation in fast-paced sports, with event times [PITH_FULL_IMAGE:figures/full_fig_p007_1_1.png] view at source ↗

**Figure 3.1.** Figure 3.1: The framework of our proposed method. AMD-FED has three training stages: [PITH_FULL_IMAGE:figures/full_fig_p013_3_1.png] view at source ↗

**Figure 4.1.** Figure 4.1: F1evt and Edit scores under few-shot (k-clip) training for the F 3Set-Tennis(sub) dataset. Percentages indicate the fraction of the full training set [PITH_FULL_IMAGE:figures/full_fig_p026_4_1.png] view at source ↗

**Figure 4.2.** Figure 4.2: F1evt and Edit scores under few-shot (k-clip) training for the Figure Skating dataset. Percentages indicate the fraction of the full training set. 20 [PITH_FULL_IMAGE:figures/full_fig_p026_4_2.png] view at source ↗

read the original abstract

Precise Event Spotting (PES) is essential in fast-paced sports such as tennis, where fine-grained events occur within very short temporal windows. Accurate frame-level localization is challenging because of motion blur, subtle action differences, and limited annotated data. We study two complementary distillation strategies for few-shot PES: Adaptive Weight Distillation (AWD), a prediction-level method that adaptively weights teacher supervision on unlabeled data, and Annealed Multimodal Distillation for Few-Shot Event Detection (AMD-FED), a representation-level framework that transfers robust skeleton knowledge into visual modalities through annealed pseudo-labeling. Both methods use multimodal distillation to improve generalization under limited supervision. We evaluate them on F3Set-Tennis(sub) under few-shot k-clip settings, where they consistently outperform single-modality baselines and prior PES approaches. After observing the stronger performance of representation-level distillation on tennis, we further validate AMD-FED on a second sports dataset, Figure Skating, where it also shows robust performance in the k-clip scenario. These results highlight the effectiveness of multimodal distillation, especially representation-level transfer, for few-shot precise event spotting.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Two new distillation strategies for few-shot sports event spotting are a direct extension of existing ideas, but the results on a tennis subset make the generalization claims hard to trust without more checks.

read the letter

The paper's main move is to define AWD as an adaptive prediction-level distillation method and AMD-FED as a representation-level one that uses annealed pseudo-labeling to move skeleton knowledge into pixel-based models for precise event spotting under few labels. Both target the practical constraints of motion blur and scarce annotations in fast sports video. The authors test the approaches on F3Set-Tennis(sub) in k-clip few-shot regimes and then run AMD-FED on figure skating, reporting better numbers than single-modality baselines and earlier PES work. That framing of multimodal transfer for temporal localization is a reasonable next step from prior distillation literature. The concrete naming and separation of prediction versus representation routes gives readers something specific to try or modify. The evaluation stays within standard sports datasets and reports consistent gains in the stated setting, which is at least a clean empirical check on the proposed mechanisms. The clearest soft spot is the choice of F3Set-Tennis(sub). The abstract gives no selection criteria for the subset and no comparison to the full dataset, so any advantage could trace to easier clips rather than the distillation itself. Without error bars, ablation tables, or dataset statistics in the summary, it is also difficult to judge how large or stable the reported improvements actually are. The free parameters around annealing and weighting are noted but not stress-tested in the provided description. This work is aimed at computer vision groups that already handle video action or few-shot temporal tasks, especially those with access to pose estimators. A reader who needs a ready template for skeleton-to-pixel transfer in low-data regimes could extract usable pieces. It is worth sending for peer review because the problem is well-motivated and the methods are spelled out enough to replicate or critique, even though the current evidence base needs tightening on dataset scope and quantitative detail before the claims can be taken as settled.

Referee Report

2 major / 2 minor

Summary. The paper proposes two multimodal distillation strategies for few-shot Precise Event Spotting (PES) in sports videos: Adaptive Weight Distillation (AWD), a prediction-level method that adaptively weights teacher supervision on unlabeled data, and Annealed Multimodal Distillation for Few-Shot Event Detection (AMD-FED), a representation-level framework that transfers skeleton knowledge to visual modalities via annealed pseudo-labeling. Both aim to improve generalization under limited supervision. The methods are evaluated on the F3Set-Tennis(sub) dataset under few-shot k-clip settings, where they reportedly outperform single-modality baselines and prior PES approaches, with AMD-FED further validated on a Figure Skating dataset showing robust performance.

Significance. If the empirical gains hold under rigorous evaluation, the work could advance few-shot PES by demonstrating the value of skeleton-to-pixel representation distillation and annealed pseudo-labeling in data-scarce sports video settings. The dual-method comparison (prediction vs. representation level) and cross-dataset validation on tennis and skating provide a useful empirical foundation for multimodal transfer in fine-grained temporal localization tasks.

major comments (2)

[Abstract and Evaluation Section] Evaluation on F3Set-Tennis(sub): The central claim of consistent outperformance rests on results reported only for the (sub) variant plus one additional dataset. No justification is provided for subset selection criteria, and results on the full F3Set-Tennis distribution are not reported, raising the possibility that observed benefits are an artifact of subset curation (e.g., clearer events or lower motion blur) rather than a general property of the distillation methods.
[Abstract] Quantitative evidence: The abstract states that the methods 'consistently outperform' baselines but provides no numerical metrics, error bars, ablation details, dataset statistics, or k-clip values. This makes it impossible to assess effect sizes, statistical significance, or whether the improvements are load-bearing for the few-shot PES claim.

minor comments (2)

[Abstract] The abstract mentions 'annealed pseudo-labeling' and 'adaptive weighting' but does not define the annealing schedule or weighting parameters, which are listed as free parameters in the analysis.
[Methods] Notation for AWD and AMD-FED should be introduced with explicit equations or pseudocode in the methods section to clarify the distinction between prediction-level and representation-level distillation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below and indicate planned revisions to strengthen the presentation of our results.

read point-by-point responses

Referee: [Abstract and Evaluation Section] Evaluation on F3Set-Tennis(sub): The central claim of consistent outperformance rests on results reported only for the (sub) variant plus one additional dataset. No justification is provided for subset selection criteria, and results on the full F3Set-Tennis distribution are not reported, raising the possibility that observed benefits are an artifact of subset curation (e.g., clearer events or lower motion blur) rather than a general property of the distillation methods.

Authors: We appreciate this observation. The F3Set-Tennis(sub) variant was selected to prioritize clips with reliable annotations and manageable computational demands for few-shot experiments; the full dataset includes additional challenging cases that increase training time substantially. In the revised manuscript we will add an explicit subsection detailing the subset selection criteria, including comparative statistics (e.g., event density, motion statistics) between the sub and full versions. We will also attempt to report results on the full F3Set-Tennis distribution if feasible within revision timelines. The cross-dataset validation on Figure Skating and the consistent gains across multiple k-clip values already provide evidence that the improvements stem from the distillation strategies rather than subset-specific properties. revision: partial
Referee: [Abstract] Quantitative evidence: The abstract states that the methods 'consistently outperform' baselines but provides no numerical metrics, error bars, ablation details, dataset statistics, or k-clip values. This makes it impossible to assess effect sizes, statistical significance, or whether the improvements are load-bearing for the few-shot PES claim.

Authors: We agree that the abstract would be strengthened by concrete quantitative anchors. In the revised version we will update the abstract to report key performance figures (e.g., mAP gains under k=1,5,10 settings on both datasets) together with a brief mention of the main ablation findings and dataset sizes. This will allow readers to immediately gauge effect sizes while preserving the abstract's brevity. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical validation on external datasets

full rationale

The paper proposes two distillation methods (AWD and AMD-FED) for few-shot PES and reports comparative performance on F3Set-Tennis(sub) and Figure Skating. No equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the abstract or described content. Results are presented as empirical outperformance against baselines, with no derivation chain that reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumption that multimodal skeleton data can be distilled effectively into visual models under limited supervision; this introduces hyperparameters for annealing and adaptive weighting that are not detailed here.

free parameters (2)

annealing schedule
AMD-FED uses annealed pseudo-labeling, which requires choosing an annealing rate or schedule as a hyperparameter.
adaptive weighting parameters
AWD adaptively weights teacher supervision, implying tunable or learned parameters for weighting.

axioms (1)

domain assumption Skeleton representations capture robust action information transferable to pixel-based models
The representation-level distillation strategy depends on this premise for knowledge transfer.

pith-pipeline@v0.9.0 · 5499 in / 1496 out tokens · 38389 ms · 2026-05-10T03:07:47.725066+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

70 extracted references · 5 canonical work pages

[1]

IEEE Signal Processing Letters , volume=

An efficient framework for automatic highlights generation from sports videos , author=. IEEE Signal Processing Letters , volume=. 2016 , publisher=

2016
[2]

, title =

Chen, G. , title =. Soft Computing (Berlin, Germany) , volume =. 2024 , note =

2024
[3]

2023 IEEE International Conference on Data Mining (ICDM) , pages =

Liu, Zhaoyu and Jiang, Kan and Hou, Zhe and Lin, Yun and Dong, Jin Song , title =. 2023 IEEE International Conference on Data Mining (ICDM) , pages =. 2023 , organization =

2023
[4]

and Kabala, Z

Ma, E. and Kabala, Z. J. , title =. Machine Learning and Knowledge Extraction , volume =
[5]

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages =

Decroos, Tom and Van Haaren, Jan and Davis, Jesse , title =. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages =. 2018 , publisher =

2018
[6]

and Mileo, A

Antonini, V. and Mileo, A. and Roantree, M. , title =. Sensors , volume =
[7]

Black, G. M. and Gabbett, T. J. and Cole, M. H. and Naughton, G. , title =. Sports Medicine , volume =
[8]

ACM Transactions on Multimedia Computing, Communications, and Applications , volume =

Bian, Jiang and Li, Xuhong and Wang, Tao and Wang, Qingzhong and Huang, Jun and Liu, Chen and Zhao, Jun and Lu, Feixiang and Dou, Dejing and Xiong, Haoyi , title =. ACM Transactions on Multimedia Computing, Communications, and Applications , volume =. 2024 , publisher =

2024
[9]

International Conference on Learning Representations (ICLR) , year =

Liu, Zhaoyu and Jiang, Kan and Ma, Murong and Hou, Zhe and Lin, Yun and Dong, Jin Song , title =. International Conference on Learning Representations (ICLR) , year =
[10]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

Shao, Dian and Zhao, Yue and Dai, Bo and Lin, Dahua , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =
[11]

Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) , pages =

Wang, Wei-Yao and Huang, Yung-Chang and Ik, Tsi-Ui and Peng, Wen-Chih , title =. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) , pages =. 2023 , publisher =

2023
[12]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

Xu, Jinglin and Rao, Yongming and Yu, Xumin and Chen, Guangyi and Zhou, Jie and Lu, Jiwen , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =
[13]

European Conference on Computer Vision (ECCV) , pages =

Hong, James and Zhang, Haotian and Gharbi, Michaël and Fisher, Matthew and Fatahalian, Kayvon , title =. European Conference on Computer Vision (ECCV) , pages =. 2022 , publisher =

2022
[14]

Proceedings of the 40th AAAI Conference on Artificial Intelligence (AAAI) , year =

Liu, Zhaoyu and Jiang, Kan and Ma, Murong and Hou, Zhe and Lin, Yun and Dong, Jin Song , title =. Proceedings of the 40th AAAI Conference on Artificial Intelligence (AAAI) , year =
[15]

moco , url=

Liu, Ziyu and Zhang, Hongwen and Chen, Zhenghao and Wang, Zhiyong and Ouyang, Wanli , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =. doi:10.1109/CVPR42600.2020.00022 , url =

work page doi:10.1109/cvpr42600.2020.00022 2020
[16]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =
[17]

Wu, Yuxin and Kirillov, Alexander and Massa, Francisco and Lo, Wan-Yen and Girshick, Ross , title =
[18]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =

Carreira, Joao and Zisserman, Andrew , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =. doi:10.1109/CVPR.2017.671 , url =

work page doi:10.1109/cvpr.2017.671 2017
[19]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages =

Feichtenhofer, Christoph and Fan, Haoqi and Malik, Jitendra and He, Kaiming , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages =
[20]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages =

Lin, Ji and Gan, Chuang and Han, Song , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages =
[21]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =

Wang, Limin and Xiong, Yuanjun and Wang, Zhe and Qiao, Yu and Lin, Dahua and Tang, Xiaoou and Van Gool, Luc , title =. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =
[22]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =

Fei-Fei, Li and Fergus, Rob and Perona, Pietro , title =. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =. 2006 , month =

2006
[23]

and Torralba, Antonio , title =

Salakhutdinov, Ruslan and Tenenbaum, Joshua B. and Torralba, Antonio , title =. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume =. 2013 , month =

2013
[24]

Proceedings of the 34th International Conference on Machine Learning (ICML) , year =

Finn, Chelsea and Abbeel, Pieter and Levine, Sergey , title =. Proceedings of the 34th International Conference on Machine Learning (ICML) , year =
[25]

and Plaut, David C

Hinton, Geoffrey E. and Plaut, David C. , title =. Proceedings of the Ninth Annual Conference of the Cognitive Science Society (CogSci) , year =
[26]

Learning to Learn , pages =

Thrun, Sebastian and Pratt, Lorien , title =. Learning to Learn , pages =. 1998 , publisher =

1998
[27]

, title =

Snell, Jake and Swersky, Kevin and Zemel, Richard S. , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[28]

Sung, Flood and Yang, Yongxin and Zhang, Li and Xiang, Tao and Torr, Philip H. S. and Hospedales, Timothy M. , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , year =
[29]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Vinyals, Oriol and Blundell, Charles and Lillicrap, Tim and Kavukcuoglu, Koray and Wierstra, Daan , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[30]

International Conference on Learning Representations (ICLR) , year =

Mishra, Nikhil and Rohaninejad, Mohammad and Chen, Xiang and Abbeel, Pieter , title =. International Conference on Learning Representations (ICLR) , year =
[31]

Proceedings of the 34th International Conference on Machine Learning (ICML) , year =

Munkhdalai, Tsendsuren and Yu, Hong , title =. Proceedings of the 34th International Conference on Machine Learning (ICML) , year =
[32]

and Rodríguez, Pau and Lacoste, Alexandre , title =

Oreshkin, Boris N. and Rodríguez, Pau and Lacoste, Alexandre , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[33]

, title =

Santoro, Adam and Bartunov, Sergey and Botvinick, Matthew and Wierstra, Daan and Lillicrap, Timothy P. , title =. Proceedings of the 33rd International Conference on Machine Learning (ICML) , year =
[34]

International Conference on Learning Representations (ICLR) , year =

Ravi, Sachin and Larochelle, Hugo , title =. International Conference on Learning Representations (ICLR) , year =
[35]

Proceedings of the 35th International Conference on Machine Learning (ICML) , year =

Lee, Yang and Choi, Sang , title =. Proceedings of the 35th International Conference on Machine Learning (ICML) , year =
[36]

, title =

Grant, Elliott and Finn, Chelsea and Levine, Sergey and Darrell, Trevor and Griffiths, Thomas L. , title =. International Conference on Learning Representations (ICLR) , year =
[37]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Zhang, Rui and Che, Tongzheng and Graham, Zoubin and Bengio, Yoshua and Song, Yang , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[38]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

Yang, Hongtao and He, Xuming and Porikli, Fatih , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =
[39]

arXiv preprint arXiv:2110.10552 , year =

Nag, Sauradip and Zhu, Xiatian and Xiang, Tao , title =. arXiv preprint arXiv:2110.10552 , year =

work page arXiv
[40]

Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition , booktitle =

Hong, James and Fisher, Matthew and Gharbi, Micha. Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition , booktitle =
[41]

and Vinyals, Oriol and Dean, Jeff , title =

Hinton, Geoffrey E. and Vinyals, Oriol and Dean, Jeff , title =. CoRR , volume =. 2015 , url =

2015
[42]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

Ahn, Sungsoo and Xu, Song , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =
[43]

IEEE Transactions on Neural Networks and Learning Systems , pages =

Chen, Hanting and Wang, Yao , title =. IEEE Transactions on Neural Networks and Learning Systems , pages =
[44]

Advances in Neural Information Processing Systems (NeurIPS) , pages =

Stanton, Samuel and Iscen, Pinar , title =. Advances in Neural Information Processing Systems (NeurIPS) , pages =
[45]

arXiv preprint arXiv:1711.02799 , year =

Dehghani, Mostafa and Mehrasa, Arash , title =. arXiv preprint arXiv:1711.02799 , year =

work page arXiv
[46]

arXiv preprint arXiv:2206.02914 , year =

Lang, Hunter and Vlachos, Andreas , title =. arXiv preprint arXiv:2206.02914 , year =

work page arXiv
[47]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Iliopoulos, Fotis and Karypis, Vassilis , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =
[48]

Proceedings of the 30th ACM International Conference on Multimedia , pages =

Duan, Haodong and Wang, Jiaqi and Chen, Kai and Lin, Dahua , title =. Proceedings of the 30th ACM International Conference on Multimedia , pages =. 2022 , publisher =

2022
[49]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages =

Chen, Yuxin and Zhang, Ziqi and Yuan, Chunfeng and Li, Bing and Deng, Ying and Hu, Weiming , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages =
[50]

IEEE Transactions on Image Processing , volume =

Shi, Lei and Zhang, Yifan and Cheng, Jian and Lu, Hanqing , title =. IEEE Transactions on Image Processing , volume =
[51]

Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , year =

Yan, Sijie and Xiong, Yuanjun and Lin, Dahua , title =. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , year =
[52]

Designing Network Design Spaces , booktitle =

Radosavovic, Ilija and Kosaraju, Raj Prateek and Girshick, Ross and He, Kaiming and Doll. Designing Network Design Spaces , booktitle =
[53]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =
[54]

Workshop on Challenges in Representation Learning, ICML , year =

Lee, Dong-Hyun , title =. Workshop on Challenges in Representation Learning, ICML , year =
[55]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

Neimark, Daniel and Bar, Ofer and Zohar, Matan and Asselmann, Dotan , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =
[56]

Proceedings of the European Conference on Computer Vision (ECCV) , year =

Hatano, Masashi and Hachiuma, Ryo and Fujii, Ryo and Saito, Hideo , title =. Proceedings of the European Conference on Computer Vision (ECCV) , year =
[57]

Krizhevsky, Alex , title =
[58]

LeCun, Yann and Cortes, Corinna and Burges, C. J. , title =
[59]

2026 , eprint =

Chen, Delong and Kasarla, Tejaswi and Bang, Yejin and Shukor, Mustafa and Chung, Willy and Yu, Jade and Bolourchi, Allen and Moutakanni, Théo and Fung, Pascale , title =. 2026 , eprint =

2026
[60]

2023 , eprint =

Du, Pan and Zhao, Suyun and Sheng, Zisen and Li, Cuiping and Chen, Hong , title =. 2023 , eprint =

2023
[61]

2023 , eprint =

Kontonis, Vasilis and Iliopoulos, Fotis and Trinh, Khoa and Baykal, Cenk and Menghani, Gaurav and Vee, Erik , title =. 2023 , eprint =

2023
[62]

and Moeller, M

Bock, M. and Moeller, M. and Van Laerhoven, K. and Kuehne, H. , title =. 2023 , eprint =

2023
[63]

and Doughty, H

Damen, D. and Doughty, H. and Farinella, G. M. and Fidler, S. and Furnari, A. and Kazakos, E. and Moltisanti, D. and Munro, J. and Perrett, T. and Price, W. and Wray, M. , title =. European Conference on Computer Vision (ECCV) , year =
[64]

and Westbury, A

Grauman, K. and Westbury, A. and Byrne, E. and Chavis, Z. and Furnari, A. and Girdhar, R. and Hamburger, J. and Jiang, H. and Liu, M. and Liu, X. and Martin, M. and Nagarajan, T. and Radosavovic, I. and Ramakrishnan, S. K. and Ryan, F. and Sharma, J. and Wray, M. and Xu, M. and Xu, E. Z. and Zhao, C. and Bansal, S. and Batra, D. and Cartillier, V. and Cra...
[65]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

T-DEED: Temporal-Discriminability Enhancer Encoder-Decoder for Precise Event Spotting in Sports Videos , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=
[66]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

Temporal Convolutional Networks for Action Segmentation and Detection , author=. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages=
[67]

Computer Vision -- ECCV 2014 , editor =

Microsoft COCO: Common Objects in Context , author =. Computer Vision -- ECCV 2014 , editor =. 2014 , publisher =

2014
[68]

Computer Vision -- ECCV 2020 , editor =

View-Invariant Probabilistic Embedding for Human Pose , author =. Computer Vision -- ECCV 2020 , editor =. 2020 , publisher =

2020
[69]

Computer Vision -- ECCV 2020 , editor =

RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , author =. Computer Vision -- ECCV 2020 , editor =. 2020 , publisher =

2020
[70]

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

ImageNet: A Large-Scale Hierarchical Image Database , author =. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =. 2009 , publisher =

2009