Recognition: 2 theorem links
· Lean TheoremRapidly deploying on-device eye tracking by distilling visual foundation models
Pith reviewed 2026-05-13 21:35 UTC · model grok-4.3
The pith
DistillGaze distills visual foundation models with synthetic labels and unlabeled real images to produce accurate 256K-parameter eye trackers deployable on new hardware.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DistillGaze proceeds in two stages. First, a visual foundation model is adapted into a domain-specialized teacher using self-supervised learning on labeled synthetic images and unlabeled real images, where synthetic data supplies gaze supervision and real data bridges the domain gap. Second, a lightweight student model is trained using both teacher guidance and self-training. On a large-scale crowd-sourced dataset with over 2,000 participants, the resulting 256K-parameter model reduces median gaze error by 58.62 percent compared with synthetic-only baselines while remaining suitable for real-time on-device deployment across varying hardware configurations.
What carries the argument
DistillGaze, a two-stage distillation framework in which a visual foundation model is first adapted via self-supervised learning on mixed synthetic and real data to create a teacher, then used to supervise a compact student model for gaze regression.
If this is right
- Eye tracking models can be trained and deployed for successive AR/VR device generations without collecting large new labeled real datasets each time.
- A 256K-parameter model supports real-time on-device inference while achieving substantially lower error than larger synthetic-only alternatives.
- The same two-stage process handles changes in camera pose, placement, and illumination without retraining from scratch.
- Synthetic data supplies scalable supervision while unlabeled real data supplies the domain adaptation signal needed for regression tasks.
- Foundation models originally trained on natural images can be specialized for near-eye infrared imagery through this distillation recipe.
Where Pith is reading between the lines
- The approach could transfer to other on-device regression problems such as hand tracking or facial landmark detection that also face synthetic-to-real gaps.
- Further gains might come from testing whether different foundation model backbones yield better teachers for the same student size.
- The method implies that improvements in the quality or diversity of synthetic eye images would directly raise the final accuracy ceiling.
- Deployment on additional hardware variants with measured error reduction would confirm the claimed adaptability across device families.
Load-bearing premise
That self-supervised adaptation of a visual foundation model on labeled synthetic data plus unlabeled real data will close the synthetic-to-real domain gap enough for high-accuracy gaze estimation across new hardware configurations.
What would settle it
A new hardware setup with different camera placement or illumination where the distilled model shows no substantial drop in median gaze error relative to a synthetic-only baseline.
read the original abstract
Eye tracking (ET) plays a critical role in augmented and virtual reality applications. However, rapidly deploying high-accuracy, on-device gaze estimation for new products remains challenging because hardware configurations (e.g., camera placement, camera pose, and illumination) often change across device generations. Visual foundation models (VFMs) are a promising direction for rapid training and deployment, and they excel on natural-image benchmarks; yet we find that off-the-shelf VFMs still struggle to achieve high accuracy on specialized near-eye infrared imagery. To address this gap, we introduce DistillGaze, a framework that distills a foundation model by leveraging labeled synthetic data and unlabeled real data for rapid and high-performance on-device gaze estimation. DistillGaze proceeds in two stages. First, we adapt a VFM into a domain-specialized teacher using self-supervised learning on labeled synthetic and unlabeled real images. Synthetic data provides scalable, high-quality gaze supervision, while unlabeled real data helps bridge the synthetic-to-real domain gap. Second, we train an on-device student using both teacher guidance and self-training. Evaluated on a large-scale, crowd-sourced dataset spanning over 2,000 participants, DistillGaze reduces median gaze error by 58.62% relative to synthetic-only baselines while maintaining a lightweight 256K-parameter model suitable for real-time on-device deployment. Overall, DistillGaze provides an efficient pathway for training and deploying ET models that adapt to hardware changes, and offers a recipe for combining synthetic supervision with unlabeled real data in on-device regression tasks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces DistillGaze, a two-stage distillation framework that first adapts a visual foundation model into a domain-specialized teacher via self-supervised learning on labeled synthetic and unlabeled real near-eye infrared images, then trains a lightweight student model using teacher guidance and self-training. The central empirical claim is that this yields a 58.62% reduction in median gaze error relative to synthetic-only baselines on a crowd-sourced dataset spanning over 2,000 participants, while producing a 256K-parameter model suitable for real-time on-device deployment and adaptation to hardware changes.
Significance. If the performance gains and cross-hardware generalization hold under more rigorous validation, the work would provide a practical recipe for rapid on-device eye-tracking deployment in AR/VR by efficiently bridging synthetic-to-real gaps without large-scale labeled real data. The lightweight model size and emphasis on unlabeled real data for adaptation are clear strengths for on-device regression tasks.
major comments (2)
- [Evaluation] The experimental evaluation reports a 58.62% median error reduction but provides no error bars, ablation details on the contribution of each stage, or statistical tests, leaving the robustness of the central performance claim unassessable from the given results.
- [Evaluation] No explicit held-out hardware split, device-type ablation, or cross-device generalization test is described; the single crowd-sourced dataset evaluation therefore does not isolate whether the domain-gap closure works for truly novel camera geometries, poses, or illumination across device generations, which is load-bearing for the adaptation claim.
minor comments (1)
- The abstract and method description would benefit from explicitly naming the synthetic-only baselines and the precise self-supervised objectives used in each stage.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which highlights important aspects of our evaluation that can be strengthened. We will revise the manuscript to address the concerns regarding robustness and generalization while maintaining the core contributions of DistillGaze.
read point-by-point responses
-
Referee: The experimental evaluation reports a 58.62% median error reduction but provides no error bars, ablation details on the contribution of each stage, or statistical tests, leaving the robustness of the central performance claim unassessable from the given results.
Authors: We agree that these elements are necessary for assessing robustness. In the revised manuscript, we will add error bars (computed as standard deviation over multiple training runs with different random seeds), detailed ablations breaking down the contribution of the self-supervised teacher adaptation stage versus the student self-training stage, and statistical significance tests (e.g., Wilcoxon signed-rank test) comparing DistillGaze against the synthetic-only baseline. These additions will directly support the reported 58.62% median error reduction. revision: yes
-
Referee: No explicit held-out hardware split, device-type ablation, or cross-device generalization test is described; the single crowd-sourced dataset evaluation therefore does not isolate whether the domain-gap closure works for truly novel camera geometries, poses, or illumination across device generations, which is load-bearing for the adaptation claim.
Authors: We acknowledge that an explicit held-out hardware split would more rigorously isolate cross-device generalization. Our crowd-sourced dataset inherently includes variations in camera geometries, poses, and illumination across 2,000+ participants, but we did not perform device-type partitioning. In revision, we will add a device-type ablation by grouping samples based on available participant metadata (e.g., inferred device characteristics) and report performance on held-out subsets. We will also expand the discussion to clarify how the two-stage framework enables adaptation to new hardware via unlabeled real data, while noting any limitations due to metadata availability as a direction for future datasets. revision: partial
Circularity Check
No significant circularity; empirical gains measured on external dataset
full rationale
The paper describes a two-stage distillation process: self-supervised adaptation of a VFM on labeled synthetic plus unlabeled real data, followed by training a lightweight student model. The central performance claim (58.62% median error reduction) is evaluated against synthetic-only baselines on a large external crowd-sourced dataset (>2000 participants). No equations, fitted parameters, or self-citations are shown to reduce this gain to a quantity defined by the inputs themselves. The method is presented as a practical recipe rather than a closed-form derivation, and the reported improvement is falsifiable via the held-out test set. This is the most common honest outcome for an empirical distillation paper.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Self-supervised learning on mixed synthetic labeled and real unlabeled near-eye images can produce a domain-specialized teacher that generalizes to new hardware
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
DistillGaze proceeds in two stages. First, we adapt a VFM into a domain-specialized teacher using self-supervised learning on labeled synthetic and unlabeled real images... Second, we train an on-device student using both teacher guidance and self-training.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We evaluate DistillGaze on a large-scale, crowd-sourced dataset spanning over 2,000 participants
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Esther YH Lin, Yimin Ding, Jogendra Kundu, Yatong An, Mohamed T El-Haddad, and Alexander Fix. Dig- itally prototype your eye tracker: Simulating hardware performance using 3d synthetic data.arXiv preprint arXiv:2503.16742, 2025
-
[2]
Enabling eye tracking for crowd-sourced data collection with project aria.IEEE Access, 2025
Yusuf Mansour, Ajoy Savio Fernandes, Kiran Somasundaram, Tarek Hefny, Mahsa Shakeri, Oleg Komogortsev, Abhishek Sharma, and Michael J Proulx. Enabling eye tracking for crowd-sourced data collection with project aria.IEEE Access, 2025
work page 2025
-
[3]
Oriane Siméoni, Huy V Vo, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Michaël Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[4]
SAM 3: Segment Anything with Concepts
Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, et al. Sam 3: Segment anything with concepts.arXiv preprint arXiv:2511.16719, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[5]
Richard J Chen, Tong Ding, Ming Y Lu, Drew FK Williamson, Guillaume Jaume, Andrew H Song, Bowen Chen, Andrew Zhang, Daniel Shao, Muhammad Shaban, et al. Towards a general-purpose foundation model for computational pathology.Nature medicine, 30(3):850–862, 2024
work page 2024
-
[6]
Akhil Kondepudi, Melike Pekmezci, Xinhai Hou, Katie Scotford, Cheng Jiang, Akshay Rao, Edward S Harake, Asadur Chowdury, Wajd Al-Holou, Lin Wang, et al. Foundation models for fast, label-free detection of glioma infiltration.Nature, 637(8045):439–445, 2025
work page 2025
-
[7]
Fan Liu, Delong Chen, Zhangqingyun Guan, Xiaocong Zhou, Jiale Zhu, Qiaolin Ye, Liyong Fu, and Jun Zhou. Remoteclip: A vision language foundation model for remote sensing.IEEE Transactions on Geoscience and Remote Sensing, 62:1–16, 2024
work page 2024
-
[8]
Danfeng Hong, Bing Zhang, Xuyang Li, Yuxuan Li, Chenyu Li, Jing Yao, Naoto Yokoya, Hao Li, Pedram Ghamisi, Xiuping Jia, et al. Spectralgpt: Spectral remote sensing foundation model.IEEE transactions on pattern analysis and machine intelligence, 46(8):5227–5244, 2024
work page 2024
-
[9]
Omnidrive: A holistic vision-language dataset for autonomous driving with counterfactual reasoning
Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine Chang, Jan Kautz, Ying Li, and Jose M Alvarez. Omnidrive: A holistic vision-language dataset for autonomous driving with counterfactual reasoning. In Proceedings of the computer vision and pattern recognition conference, pages 22442–22452, 2025
work page 2025
-
[10]
Elias Daniel Guestrin and Moshe Eizenman. General theory of remote gaze estimation using the pupil center and corneal reflections.IEEE Transactions on biomedical engineering, 53(6):1124–1133, 2006. 11
work page 2006
-
[11]
Eye gaze tracking under natural head movements
Zhiwei Zhu and Qiang Ji. Eye gaze tracking under natural head movements. In2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 918–923. IEEE, 2005
work page 2005
-
[12]
Meng Liu, Youfu Li, and Hai Liu. 3d gaze estimation for head-mounted eye tracking system with auto-calibration method.IEEE Access, 8:104207–104215, 2020
work page 2020
-
[13]
Appearance-based gaze estimation in the wild
Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. Appearance-based gaze estimation in the wild. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 4511–4520, 2015
work page 2015
-
[14]
Eth-xgaze: A large scale dataset for gaze estimation under extreme head pose and gaze variation
Xucong Zhang, Seonwook Park, Thabo Beeler, Derek Bradley, Siyu Tang, and Otmar Hilliges. Eth-xgaze: A large scale dataset for gaze estimation under extreme head pose and gaze variation. InEuropean Conference on Computer Vision, pages 365–381, 2020
work page 2020
-
[15]
Gaze estimation using transformer
Yihua Cheng and Feng Lu. Gaze estimation using transformer. InInternational Conference on Pattern Recognition (ICPR), pages 3341–3347. IEEE, 2022
work page 2022
-
[16]
Puregaze: Purifying gaze feature for generalizable gaze estimation
Yihua Cheng, Yiwei Bao, and Feng Lu. Puregaze: Purifying gaze feature for generalizable gaze estimation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 436–443, 2022
work page 2022
-
[17]
Gaze360: Physically unconstrained gaze estimation in the wild
Petr Kellnhofer, Adria Recasens, Simon Stent, Wojciech Matusik, and Antonio Torralba. Gaze360: Physically unconstrained gaze estimation in the wild. InProceedings of the IEEE/CVF international conference on computer vision, pages 6912–6921, 2019
work page 2019
-
[18]
Nvgaze: An anatomically-informed dataset for low-latency, near-eye gaze estimation
Joohwan Kim, Michael Stengel, Alexander Majercik, Shalini De Mello, David Dunn, Samuli Laine, Morgan McGuire, and David Luebke. Nvgaze: An anatomically-informed dataset for low-latency, near-eye gaze estimation. InProceedings of the 2019 CHI conference on human factors in computing systems, pages 1–12, 2019
work page 2019
-
[19]
Openeds2020: Open eyes dataset.arXiv preprint arXiv:2005.03876, 2020
Cristina Palmero, Abhishek Sharma, Karsten Behrendt, Kapil Krishnakumar, Oleg V Komogortsev, and Sachin S Talathi. Openeds2020: Open eyes dataset.arXiv preprint arXiv:2005.03876, 2020
-
[20]
Wolfgang Fuhl, Gjergji Kasneci, and Enkelejda Kasneci. Teyed: Over 20 million real-world eye images with pupil, eyelid, and iris 2d and 3d segmentations, 2d and 3d landmarks, 3d eyeball, gaze vector, and eye movement types. arXiv preprint arXiv:2102.02115, 2021
-
[21]
Deˆ 2gaze: Deformable and decoupled representation learning for 3d gaze estimation
Yunfeng Xiao, Xiaowei Bai, Baojun Chen, Hao Su, Hao He, Liang Xie, and Erwei Yin. Deˆ 2gaze: Deformable and decoupled representation learning for 3d gaze estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3091–3100, 2025
work page 2025
-
[22]
U2eyes: A binocular dataset for eye tracking and gaze estimation
Sonia Porta, Benoit Bossavit, Rafael Cabeza, Andoni Larumbe-Bergera, Gonzalo Garde, and Arantxa Villanueva. U2eyes: A binocular dataset for eye tracking and gaze estimation. InProceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 0–0, 2019
work page 2019
-
[23]
Learning an appearance-based gaze estimator from one million synthesised images
Erroll Wood, Tadas Baltrušaitis, Louis-Philippe Morency, Peter Robinson, and Andreas Bulling. Learning an appearance-based gaze estimator from one million synthesised images. InProceedings of the ninth biennial ACM symposium on eye tracking research & applications, pages 131–138, 2016
work page 2016
-
[24]
Learning from simulated and unsupervised images through adversarial training
Ashish Shrivastava, Tomas Pfister, Oncel Tuzel, Joshua Susskind, Wenda Wang, and Russell Webb. Learning from simulated and unsupervised images through adversarial training. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2107–2116, 2017
work page 2017
-
[25]
Viet Dung Nguyen, Reynold Bailey, Gabriel J Diaz, Chengyi Ma, Alexander Fix, and Alexander Ororbia. Deep domain adaptation: A sim2real neural approach for improving eye-tracking systems.Proceedings of the ACM on Computer Graphics and Interactive Techniques, 7(2):1–17, 2024
work page 2024
-
[26]
Rendering of eyes for eye-shape registration and gaze estimation
Erroll Wood, Tadas Baltrusaitis, Xucong Zhang, Yusuke Sugano, Peter Robinson, and Andreas Bulling. Rendering of eyes for eye-shape registration and gaze estimation. InProceedings of the IEEE International Conference on Computer Vision, pages 3756–3764, 2015
work page 2015
-
[27]
Gengyan Li, Abhimitra Meka, Franziska Mueller, Marcel C Buehler, Otmar Hilliges, and Thabo Beeler. Eyenerf: a hybrid representation for photorealistic synthesis, animation and relighting of human eyes.ACM Transactions on Graphics (TOG), 41(4):1–16, 2022
work page 2022
-
[28]
Self-supervised domain adaptation for computer vision tasks
Jiaolong Xu, Liang Xiao, and Antonio M López. Self-supervised domain adaptation for computer vision tasks. IEEE Access, 7:156694–156706, 2019
work page 2019
-
[29]
Isabela Albuquerque, Nikhil Naik, Junnan Li, Nitish Keskar, and Richard Socher. Improving out-of-distribution generalization via multi-task self-supervised pretraining.arXiv preprint arXiv:2003.13525, 2020. 12
-
[30]
A cookbook of self-supervised learning.arXiv preprint arXiv:2304.12210, 2023
Randall Balestriero, Mark Ibrahim, Vlad Sobal, Ari Morcos, Shashank Shekhar, Tom Goldstein, Florian Bordes, Adrien Bardes, Gregoire Mialon, Yuandong Tian, et al. A cookbook of self-supervised learning.arXiv preprint arXiv:2304.12210, 2023
-
[31]
A simple framework for contrastive learning of visual representations
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. InInternational conference on machine learning, pages 1597–1607. PmLR, 2020
work page 2020
-
[32]
Momentum contrast for unsupervised visual representation learning
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020
work page 2020
-
[33]
Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, et al. Bootstrap your own latent-a new approach to self-supervised learning.Advances in neural information processing systems, 33:21271–21284, 2020
work page 2020
-
[34]
Emerging properties in self-supervised vision transformers
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021
work page 2021
-
[35]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[36]
Adrien Bardes, Jean Ponce, and Yann LeCun. Vicreg: Variance-invariance-covariance regularization for self- supervised learning.arXiv preprint arXiv:2105.04906, 2021
-
[37]
Masked autoencoders are scalable vision learners
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022
work page 2022
-
[38]
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao, Li Dong, Songhao Piao, and Furu Wei. Beit: Bert pre-training of image transformers.arXiv preprint arXiv:2106.08254, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[39]
iBOT: Image BERT Pre-Training with Online Tokenizer
Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, and Tao Kong. ibot: Image bert pre-training with online tokenizer.arXiv preprint arXiv:2111.07832, 2021
work page internal anchor Pith review arXiv 2021
-
[40]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021
work page 2021
-
[41]
Distilling the Knowledge in a Neural Network
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[42]
Fitnets: Hints for thin deep nets, 2015
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets, 2015
work page 2015
-
[43]
Sergey Zagoruyko and Nikos Komodakis. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer.arXiv preprint arXiv:1612.03928, 2016
-
[44]
Similarity-preserving knowledge distillation
Frederick Tung and Greg Mori. Similarity-preserving knowledge distillation. InProceedings of the IEEE/CVF international conference on computer vision, pages 1365–1374, 2019
work page 2019
-
[45]
Contrastive Representation Distillation
Yonglong Tian, Dilip Krishnan, and Phillip Isola. Contrastive Representation Distillation. InInternational Conference on Learning Representations, 2020
work page 2020
-
[46]
Co-training and co-distillation for quality improvement and compression of language models
Hayeon Lee, Rui Hou, Jongpil Kim, Davis Liang, Hongbo Zhang, Sung Hwang, and Alexander Min. Co-training and co-distillation for quality improvement and compression of language models. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 7458–7467, 2023
work page 2023
-
[47]
Heitor R Guimarães, Arthur Pimentel, Anderson Avila, and Tiago H Falk. Vic-kd: Variance-invariance-covariance knowledge distillation to make keyword spotting more robust against adversarial attacks. InICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 12196–12200. IEEE, 2024
work page 2024
-
[48]
Fbnet: Hardware-aware efficient convnet design via differentiable neural 13 architecture search
Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. Fbnet: Hardware-aware efficient convnet design via differentiable neural 13 architecture search. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10734–10742, 2019
work page 2019
-
[49]
Samantha Aziz, Dillon J Lohr, Lee Friedman, and Oleg Komogortsev. Evaluation of eye tracking signal quality for virtual reality applications: A case study in the meta quest pro. InProceedings of the 2024 Symposium on Eye Tracking Research and Applications, pages 1–8, 2024
work page 2024
-
[50]
Dare-gram: Unsupervised domain adaptation regression by aligning inverse gram matrices
Ismail Nejjar, Qin Wang, and Olga Fink. Dare-gram: Unsupervised domain adaptation regression by aligning inverse gram matrices. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11744–11754, 2023. 14 A Additional data description In Section 4.1, we described the Project Aria dataset. The distribution of the real da...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.