pith. sign in

arxiv: 2604.15348 · v1 · submitted 2026-03-30 · 💻 cs.HC

GazeSync: A Mobile Eye-Tracking Tool for Analyzing Visual Attention on Dynamically Manipulated Content

Pith reviewed 2026-05-14 21:57 UTC · model grok-4.3

classification 💻 cs.HC
keywords eye-trackingmobile HCIvisual attentiondynamic contentUI transformationsgaze synchronizationcalibration drift
0
0 comments X

The pith

GazeSync synchronizes on-device gaze estimation with real-time UI transformation matrices to recover image-relative attention on dynamic mobile content.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Conventional mobile eye-tracking uses fixed screen coordinates that lose meaning once users pinch zoom or rotate images. GazeSync logs gaze data together with the exact scale rotation and translation states of the underlying content so that attention points can be mapped back to the image itself. A formative study with guided manipulation reading and search tasks shows the system recovers ground-truth locations more accurately than static baselines. The work also identifies practical limits such as calibration drift that appear when multiple transformations occur together.

Core claim

GazeSync enables accurate reconstruction of image-relative gaze patterns by pairing on-device eye estimation with precise real-time UI transformation matrices, thereby decoupling visual attention from fixed device coordinates and outperforming static mapping approaches while exposing calibration boundaries under compound manipulations.

What carries the argument

GazeSync, the end-to-end mobile system that synchronizes gaze coordinates with live scale, rotation, and translation matrices to reconstruct content-relative attention.

If this is right

  • Attention patterns can be analyzed during natural pinch zoom and rotate interactions without losing semantic reference to the image content.
  • Static coordinate baselines are shown to be inferior for recovering true gaze locations once content transforms.
  • Calibration drift and reconstruction fragility become measurable boundaries when multiple transformations are applied together.
  • The toolchain supports guided manipulation reading and visual search tasks on mobile devices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same synchronization approach could be tested on video playback or animated interfaces where content changes continuously.
  • Real-time feedback loops might use the reconstructed gaze to adapt UI elements during user manipulations.
  • Extending the method to multi-user shared screens could reveal how attention shifts when collaborators transform the same content.

Load-bearing premise

On-device gaze estimation can be kept synchronized with precise real-time UI transformation matrices without significant drift or accuracy loss when users perform combined pinch-zoom-rotate actions.

What would settle it

A controlled test that measures large systematic deviation between GazeSync-reconstructed gaze points and independently verified ground-truth locations on content undergoing simultaneous scale rotation and translation.

Figures

Figures reproduced from arXiv: 2604.15348 by Juan Ye, Rishab Talwar, Shijing He, Xinya Gong, Xudong Cai, Yaxiong Lei, Yuheng Wang, Zhongliang Guo.

Figure 1
Figure 1. Figure 1: Overview of GazeSync. The system synchronizes on-device gaze estimates with image transformation parameters [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: GazeSync architecture. On-device gaze estimation produces screen-coordinate gaze samples, while the Flutter UI [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Screen-space vs. image-relative heatmaps. Under [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 23
Figure 23. Figure 23: Task 5 Analysis. The screen-based heat map shows the best accuracy observed across tasks, and the tracking data is clear. Note that an image-based heat map was not generated for this task. Task 6 [PITH_FULL_IMAGE:figures/full_fig_p005_23.png] view at source ↗
Figure 25
Figure 25. Figure 25: Task 6 Analysis. While the raw gaze trace might initially seem inconsistent, when combined with the screen-based heat map, the overall pattern aligns well with the expected movement. Task 8 [PITH_FULL_IMAGE:figures/full_fig_p005_25.png] view at source ↗
Figure 31
Figure 31. Figure 31: Task 11 Analysis. The data for this task is harder to interpret since the participant's behaviour was not strictly guided as the other tasks, resulting in less predictable gaze patterns. 8.2 Findings Ground Truth Tasks (Tasks 1-7) These tasks included guided manipulations such as tracing dotted paths, zooming, and rotating images to predefined orientations. The key observations included: • Calibration Rep… view at source ↗
read the original abstract

Conventional mobile eye-tracking maps gaze to static screen coordinates, failing to capture user attention when content is dynamic. As users pinch, zoom, and rotate images, static coordinates lose their semantic meaning relative to the underlying visual content. To address this methodological gap, we present \textit{GazeSync}, a reusable mobile system that synchronizes on-device gaze estimation with real-time image transformation matrices (scale, rotation, and translation). By logging gaze coordinates alongside precise UI states, GazeSync enables the accurate reconstruction of \textit{image-relative} attention patterns, decoupling visual attention from device interaction. We validate our end-to-end toolchain through a formative study involving guided manipulation, reading, and visual search tasks. Our results demonstrate GazeSync's ability to recover ground-truth gaze locations on transforming content, explicitly showing how it outperforms static baselines, while also surfacing critical boundaries regarding calibration drift and reconstruction fragility under compound manipulations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper presents GazeSync, a reusable mobile system that synchronizes on-device gaze estimation with real-time image transformation matrices (scale, rotation, translation) to reconstruct image-relative gaze locations during dynamic manipulations such as pinch-zoom-rotate. It describes a formative study with guided manipulation, reading, and visual search tasks, claiming that the system recovers ground-truth gaze on transforming content, outperforms static baselines, and identifies boundaries like calibration drift and fragility under compound manipulations.

Significance. If the synchronization accuracy and reconstruction claims hold with supporting metrics, GazeSync would address a clear methodological gap in mobile HCI eye-tracking by enabling attention analysis decoupled from interaction on dynamic content. This could support more ecologically valid studies of visual attention during real-world mobile tasks, with the reusable toolchain as a practical contribution for the community.

major comments (3)
  1. [Abstract] Abstract: the central claim that GazeSync 'recovers ground-truth gaze locations' and 'outperforms static baselines' is unsupported by any quantitative error metrics, participant counts, statistical tests, or reconstruction accuracy numbers, leaving the empirical validation only partially described.
  2. [Formative study description] The formative study section does not specify how ground-truth gaze was independently established (e.g., fiducial markers, post-hoc annotation, or eye-tracker calibration validation) nor report error rates specifically for compound pinch-zoom-rotate sequences versus single-axis changes, which directly undermines the outperformance and fragility claims.
  3. [System description] No details are provided on the real-time acquisition of UI transformation matrices, synchronization latency, or any ablation isolating drift accumulation during simultaneous multi-axis manipulations, which is load-bearing for the decoupling of attention from interaction.
minor comments (2)
  1. [Formative study] Clarify the exact number of participants, task durations, and device models used in the study to allow replication.
  2. [Results] Add a figure or table summarizing reconstruction error under different manipulation types to make the results more concrete.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback. We address each major comment below and indicate planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that GazeSync 'recovers ground-truth gaze locations' and 'outperforms static baselines' is unsupported by any quantitative error metrics, participant counts, statistical tests, or reconstruction accuracy numbers, leaving the empirical validation only partially described.

    Authors: We agree that the abstract summarizes results at a high level without quantitative metrics. The full manuscript reports reconstruction errors and baseline comparisons in the evaluation section, but we will revise the abstract to include key metrics (e.g., mean error, participant count, and statistical comparisons) so the central claims are substantiated at the abstract level. revision: yes

  2. Referee: [Formative study description] The formative study section does not specify how ground-truth gaze was independently established (e.g., fiducial markers, post-hoc annotation, or eye-tracker calibration validation) nor report error rates specifically for compound pinch-zoom-rotate sequences versus single-axis changes, which directly undermines the outperformance and fragility claims.

    Authors: We acknowledge this gap in the description. Ground-truth was established via post-hoc annotation aligned to the applied transformation matrices. We will revise the formative study section to explicitly describe this process and add separate error-rate breakdowns for compound versus single-axis manipulations to better support the outperformance and fragility claims. revision: yes

  3. Referee: [System description] No details are provided on the real-time acquisition of UI transformation matrices, synchronization latency, or any ablation isolating drift accumulation during simultaneous multi-axis manipulations, which is load-bearing for the decoupling of attention from interaction.

    Authors: We agree these implementation details are necessary for reproducibility. We will expand the system description to cover real-time matrix acquisition from the mobile UI framework, report measured synchronization latency, and include an ablation isolating drift under simultaneous multi-axis manipulations. revision: yes

Circularity Check

0 steps flagged

No circularity: system description plus empirical validation with no derivations or self-referential reductions

full rationale

The paper presents GazeSync as a mobile system that logs gaze coordinates alongside UI transformation matrices (scale, rotation, translation) to reconstruct image-relative attention, validated via a formative study of guided manipulation, reading, and visual search tasks. No equations, fitted parameters, derivations, or self-citations appear in the abstract or described content. The central claim of outperforming static baselines in recovering ground-truth gaze locations rests on direct empirical comparison rather than any reduction to self-defined quantities by construction. The analysis is therefore self-contained with no load-bearing steps that collapse to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper contributes a new engineering system rather than a mathematical derivation. It relies on standard assumptions about mobile device APIs providing real-time transformation matrices and on-device gaze estimation being available.

axioms (1)
  • domain assumption Mobile device APIs expose real-time image transformation matrices (scale, rotation, translation) that can be logged synchronously with gaze data
    Invoked to enable the core synchronization step described in the abstract.
invented entities (1)
  • GazeSync system no independent evidence
    purpose: Synchronizing gaze estimation with live image transformations for image-relative attention reconstruction
    The system itself is the novel artifact introduced by the paper.

pith-pipeline@v0.9.0 · 5484 in / 1271 out tokens · 39676 ms · 2026-05-14T21:57:42.792777+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

  1. [1]

    Andreas Bulling and Hans Gellersen. 2010. Toward mobile eye-based human- computer interaction.IEEE Pervasive Computing9, 4 (2010), 8–12

  2. [2]

    Zhuojiang Cai, Jingkai Hong, Zhimin Wang, and Feng Lu. 2025. GazeSwipe: Enhancing Mobile Touchscreen Reachability through Seamless Gaze and Finger- Swipe Integration. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–14

  3. [3]

    Yihua Cheng, Haofei Wang, Yiwei Bao, and Feng Lu. 2024. Appearance-based gaze estimation with deep learning: A review and benchmark.IEEE Transactions on Pattern Analysis and Machine Intelligence46, 12 (2024), 7509–7528

  4. [4]

    Francisco Diaz-Guerra and Angel Jimenez-Molina. 2023. Continuous Prediction of Web User Visual Attention on Short Span Windows Based on Gaze Data Analytics.Sensors23, 4 (2023), 2294

  5. [5]

    2017.Eye tracking methodology: Theory and practice

    Andrew T Duchowski and Andrew T Duchowski. 2017.Eye tracking methodology: Theory and practice. Springer

  6. [6]

    Leonela González-Vides, José Luis Hernández-Verdejo, and Pilar Cañadas-Suárez

  7. [7]

    Eye tracking in optometry: A systematic review.Journal of Eye Movement Research16, 3 (2023), 10–16910

  8. [8]

    Elias Daniel Guestrin and Moshe Eizenman. 2006. General theory of remote gaze estimation using the pupil center and corneal reflections.IEEE Transactions on biomedical engineering53, 6 (2006), 1124–1133

  9. [9]

    Nishan Gunawardena, Jeewani Anupama Ginige, Bahman Javadi, and Gough Lui. 2024. Deep learning based eye tracking on smartphones for dynamic visual stimuli.Procedia Computer Science246 (2024), 3733–3742

  10. [10]

    Jarodzka Halszka, Kenneth Holmqvist, and Hans Gruber. 2017. Eye tracking in Educational Science: Theoretical frameworks and research agendas.Journal of eye movement research10, 1 (2017), 10–16910

  11. [11]

    Shijing He, Yaxiong Lei, Zihan Zhang, Yuzhou Sun, Shujun Li, Chi Zhang, and Juan Ye. 2025. Identity deepfake threats to biometric authentication systems: Public and expert perspectives.arXiv preprint arXiv:2506.06825(2025)

  12. [12]

    Christina Katsini, Yasmeen Abdrabou, George E Raptis, Mohamed Khamis, and Florian Alt. 2020. The role of eye gaze in security and privacy applications: Survey and future HCI research directions. InProceedings of the 2020 CHI conference on human factors in computing systems. 1–21

  13. [13]

    Kyle Krafka, Aditya Khosla, Petr Kellnhofer, Harini Kannan, Suchendra Bhan- darkar, Wojciech Matusik, and Antonio Torralba. 2016. Eye tracking for everyone. InProceedings of the IEEE conference on computer vision and pattern recognition. 2176–2184

  14. [14]

    Yaxiong Lei. 2021. Eye tracking calibration on mobile devices. InACM Symposium on Eye Tracking Research and Applications. 1–4

  15. [15]

    Yaxiong Lei, Xinya Gong, Shijing He, Yafei Wang, Mohamed Khamis, and Juan Ye. 2026. The People’s Gaze: Co-Designing and Refining Gaze Gestures with Users and Experts. InProceedings of the 2026 CHI conference on human factors in computing systems

  16. [16]

    Yaxiong Lei, Shijing He, Huining Feng, Kaixing Zhao, Mohamed Khamis, and Juan Ye. 2023. Protecting Privacy in an Era of Pervasive Camera-Based Devices: Challenges and Potential Directions. InProc. UK Mobile, Wearable and Ubiquitous GazeSync: A Mobile Eye-Tracking Tool for Analyzing Visual Attention on Dynamically Manipulated Content CHI EA ’26, April 13–1...

  17. [17]

    Yaxiong Lei, Shijing He, Mohamed Khamis, and Juan Ye. 2023. An end-to-end review of gaze estimation and its interactive applications on handheld mobile devices.Comput. Surveys56, 2 (2023), 1–38

  18. [18]

    Yaxiong Lei, Yuheng Wang, Fergus Buchanan, Mingyue Zhao, Yusuke Sugano, Shijing He, Mohamed Khamis, and Juan Ye. 2025. Quantifying the impact of motion on 2d gaze estimation in real-world mobile interactions.arXiv preprint arXiv:2502.10570(2025)

  19. [19]

    Yaxiong Lei, Yuheng Wang, Tyler Caslin, Alexander Wisowaty, Xu Zhu, Mohamed Khamis, and Juan Ye. 2023. DynamicRead: Exploring robust gaze interaction methods for reading on handheld mobile devices under dynamic conditions. Proceedings of the ACM on Human-Computer Interaction7, ETRA (2023), 1–17

  20. [20]

    Yaxiong Lei, Mingyue Zhao, Yuheng Wang, Shijing He, Yusuke Sugano, Yafei Wang, Kaixing Zhao, Mohamed Khamis, and Juan Ye. 2025. MAC-Gaze: Motion-Aware Continual Calibration for Mobile Gaze Tracking.arXiv preprint arXiv:2505.22769(2025)

  21. [21]

    Julien Mercier, Olivier Ertz, and Erwan Bocher. 2024. Quantifying dwell time with location-based augmented reality: Dynamic AOI analysis on mobile eye tracking data with vision transformer.Journal of Eye Movement Research17, 3 (2024), 10–16910

  22. [22]

    Omar Namnakani. 2023. Gaze-based Interaction on Handheld Mobile Devices. InProceedings of the 2023 Symposium on Eye Tracking Research and Applications. 1–4

  23. [23]

    Yun Suen Pai, Benjamin Tag, Benjamin Outram, Noriyasu Vontin, Kazunori Sugiura, and Kai Kunze. 2016. GazeSim: simulating foveated rendering using depth in eye gaze for VR. InACM SIGGRAPH 2016 Posters. 1–2

  24. [24]

    Argenis Ramirez Ramirez Gomez, Christopher Clarke, Ludwig Sidenmark, and Hans Gellersen. 2021. Gaze+ hold: eyes-only direct manipulation with continuous gaze modulated by closure of one eye. InACM symposium on eye tracking research and applications. 1–12

  25. [25]

    Aaron Ruß. 2011. Modeling visual attention for rule-based usability simulations of elderly citizen. InInternational Conference on Engineering Psychology and Cognitive Ergonomics. Springer, 72–81

  26. [26]

    Sophie Stellmach and Raimund Dachselt. 2012. Investigating gaze-supported multimodal pan and zoom. InProceedings of the Symposium on Eye Tracking Research and Applications. 357–360

  27. [27]

    Adam Strupczewski, Błażej Czupryński, Jacek Naruniec, and Kamil Mucha. 2016. Geometric eye gaze tracking. InInternational Conference on Computer Vision Theory and Applications, Vol. 4. SCITEPRESS, 444–455

  28. [28]

    Yusuke Sugano, Yasuyuki Matsushita, Yoichi Sato, and Hideki Koike. 2015. Appearance-based gaze estimation with online calibration from mouse oper- ations.IEEE Transactions on Human-Machine Systems45, 6 (2015), 750–760

  29. [29]

    Hsin-Pei Sun, Cheng-Hsun Yang, and Shang-Hong Lai. 2017. A deep learning approach to appearance-based gaze estimation under head pose variations. In 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR). IEEE, 935–940

  30. [30]

    Mohammed Tahri Sqalli, Begali Aslonov, Mukhammadjon Gafurov, Nurmukham- mad Mukhammadiev, and Yahya Sqalli Houssaini. 2023. Eye tracking technology in medical practice: a perspective on its diverse applications.Frontiers in Medical Technology5 (2023), 1253001

  31. [31]

    Nachiappan Valliappan, Na Dai, Ethan Steinberg, Junfeng He, Kantwon Rogers, Venky Ramachandran, Pingmei Xu, Mina Shojaeizadeh, Li Guo, Kai Kohlhoff, et al. 2020. Accelerating eye movement research via accurate and affordable smartphone eye tracking.Nature communications11, 1 (2020), 4553

  32. [32]

    VisualCamp Co., Ltd. [n. d.]. Eyedid SDK | For Developer. https://sdk.eyedid.ai/. Accessed: 2025-08-19

  33. [33]

    Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2015. Appearance-based gaze estimation in the wild. InProceedings of the IEEE confer- ence on computer vision and pattern recognition. 4511–4520

  34. [34]

    Xiaolong Zhou, Haibin Cai, Zhanpeng Shao, Hui Yu, and Honghai Liu. 2016. 3D eye model-based gaze estimation from a depth sensor. In2016 IEEE international conference on robotics and biomimetics (ROBIO). IEEE, 369–374