arxiv: 2604.09659 · v1 · submitted 2026-03-30 · 💻 cs.HC

Recognition: 2 theorem links

· Lean Theorem

GazeCode: Recall-Based Verification for Higher-Quality In-the-Wild Mobile Gaze Data Collection

Yaxiong Lei , Thomas Davies , Xinya Gong , Shijing He , Juan Ye

Authors on Pith no claims yet

Pith reviewed 2026-05-14 21:53 UTC · model grok-4.3

classification 💻 cs.HC

keywords mobile gaze estimationin-the-wild data collectionrecall verificationlabel validityperipheral visiongaze data qualityattention confirmation

0 comments

The pith

GazeCode verifies true foveation during mobile gaze recording by requiring users to recall multi-digit codes shown with low-opacity brief stimuli.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GazeCode to address noise in large-scale mobile gaze datasets collected outside controlled labs. It replaces weak validation methods with a multi-digit recall task that drops random guessing odds to one in 10 to the power of N, while using small low-contrast digits that are hard to read from the side. Formative tests indicate these stimuli remain readable under direct fixation but block peripheral success, so correct recall serves as evidence of attentive looking. The system also records synchronized front-camera video, motion data, and precise timestamps for later analysis. The result is a practical way to gather higher-confidence gaze labels without constant oversight.

Core claim

GazeCode is a recall-based verification paradigm for higher-confidence in-the-wild mobile gaze data collection. It strengthens label validity through a multi-digit recall task that reduces random success probability to 10 to the power of minus N and pairs it with anti-peripheral stimulus design using small low-contrast brief digits. In a formative study the low-opacity digits substantially reduced peripheral readability while staying usable for attentive foveation, supporting the inference that correct recall corresponds to higher-confidence gaze labels.

What carries the argument

The recall-based verification paradigm that combines a multi-digit code task with low-opacity brief stimuli to link successful recall to direct foveation.

If this is right

Random guessing success drops to 10 to the power of minus N for an N-digit code.
Low-opacity brief digits make peripheral reading unreliable while preserving central readability.
Synchronized logging of video, IMU, and target events supports post-collection validation.
Design guidelines for stimulus opacity and duration follow directly from the parameter tests.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same recall structure could be adapted to verify attention in other mobile sensing tasks such as reading or navigation prompts.
Combining the method with existing gaze estimation models might enable automatic down-weighting of uncertain samples during training.
Scaling the approach to larger and more diverse user groups would test whether the peripheral-blocking effect holds across ages and lighting conditions.

Load-bearing premise

Correct recall of the code means the user looked directly at the target rather than succeeding through peripheral vision or other strategies.

What would settle it

A controlled test with more participants instructed to avoid direct fixation, measuring whether recall accuracy stays low under peripheral viewing conditions.

Figures

Figures reproduced from arXiv: 2604.09659 by Juan Ye, Shijing He, Thomas Davies, Xinya Gong, Yaxiong Lei.

**Figure 1.** Figure 1: Overview of the GazeCode workflow. (1) Users first calibrate device orientation to ensure diverse gaze angles. (2) [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: GazeCode flow. (a) Random device orientation is enforced. (b) Digits appear sequentially at random screen locations. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Formative results. Low opacity (0.1) significantly increases the time required to read digits and drastically reduces peripheral legibility, [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

Large-scale mobile gaze estimation relies on in-the-wild datasets, yet unsupervised collection makes it difficult to verify whether participants truly foveate logged targets. Prior mobile protocols often use low-entropy validation (e.g., binary probes) that can be satisfied by guessing and may still allow peripheral viewing, introducing label noise. We present \textbf{GazeCode}, a recall-based verification paradigm for higher-confidence in-the-wild mobile gaze data collection that strengthens \emph{label validity} through a multi-digit recall task (reducing random success to $10^{-N}$) paired with anti-peripheral stimulus design (small, low-contrast, brief digits). The system logs synchronized front-camera video, IMU streams, and target events using high-resolution timestamps. In a formative study (N=3), we probe key parameters (opacity, duration) and directly test peripheral exploitability using an eccentricity-controlled \textit{RING} condition. Results show that low-opacity digits substantially reduce peripheral readability while remaining usable for attentive foveation, supporting the inference that correct recall corresponds to higher-confidence gaze labels. We conclude with actionable design guidelines for robust in-the-wild gaze data collection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GazeCode adds a recall task with low-opacity digits to cut label noise in mobile gaze collection, but the N=3 formative study leaves the central claim thinly supported.

read the letter

The paper's main contribution is a verification step that asks users to recall a short multi-digit code shown briefly on screen. Pairing that with small, low-contrast, short-duration digits is meant to block peripheral viewing while still allowing foveal reading. The system also records front-camera video, IMU, and precise timestamps, which is straightforward but helpful for later checks. In the RING eccentricity test they show that low opacity makes peripheral reading much harder, which is the part that actually feels new compared with simple binary probes in earlier work. That design choice gives a practical lever for unsupervised collection. The results from the three participants line up with the idea that correct recall tracks better with attentive gaze, and the authors end with some usable parameter guidelines on opacity and duration. The soft spot is obvious and not minor: everything rests on N=3 with no power analysis, no confidence intervals, and no test for individual differences in peripheral vision. The separation they report could easily be participant-specific rather than general, so the inference that recall reliably signals foveation is still provisional. This is aimed at people building or cleaning large in-the-wild mobile gaze datasets. Anyone running data collection protocols would find the stimulus design and logging details worth looking at, even if they end up running their own larger test. It deserves a serious referee because the problem is real and the method is cheap to implement and falsify. I would send it out, but with a clear note that the validation needs to scale before the claims can be taken as settled.

Referee Report

2 major / 2 minor

Summary. The paper introduces GazeCode, a recall-based verification paradigm for in-the-wild mobile gaze data collection that pairs a multi-digit recall task (reducing random success probability to 10^{-N}) with anti-peripheral stimulus design (small, low-opacity, brief digits). A formative study (N=3) probes parameters such as digit opacity and duration and tests peripheral exploitability via an eccentricity-controlled RING condition, showing that low-opacity digits reduce peripheral readability while remaining usable for foveation, thereby supporting higher-confidence gaze labels.

Significance. If the core inference holds under larger-scale validation, GazeCode could meaningfully improve label validity in mobile gaze datasets by making peripheral viewing and guessing strategies less viable, offering concrete design guidelines for unsupervised data collection in HCI and eye-tracking research.

major comments (2)

[Formative study / Results] Formative study (N=3): the sample provides initial parameter probing and eccentricity test results but lacks statistical power, quantitative metrics (e.g., accuracy rates with confidence intervals), power analysis, or controls for individual differences in peripheral acuity, which directly undermines the central claim that correct recall reliably signals foveation rather than partial peripheral success.
[RING condition / Results] RING eccentricity condition: while the design shows reduced peripheral readability at low opacity, the N=3 results are reported without per-participant breakdowns or tests for generalizability, leaving the separation between conditions vulnerable to idiosyncratic effects rather than establishing a robust anti-peripheral property.

minor comments (2)

[Abstract] Abstract: the claim of 'higher-confidence gaze labels' should be qualified as provisional pending larger validation, given the formative nature of the evidence.
[Introduction / Method] Notation: clarify whether '10^{-N}' assumes uniform random guessing or accounts for possible partial recall strategies.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below, clarifying the exploratory nature of the formative study and incorporating additional result details to improve transparency.

read point-by-point responses

Referee: [Formative study / Results] Formative study (N=3): the sample provides initial parameter probing and eccentricity test results but lacks statistical power, quantitative metrics (e.g., accuracy rates with confidence intervals), power analysis, or controls for individual differences in peripheral acuity, which directly undermines the central claim that correct recall reliably signals foveation rather than partial peripheral success.

Authors: We agree that the N=3 formative study lacks statistical power, confidence intervals, power analysis, and controls for individual peripheral acuity differences. The study was intended as an initial parameter probe rather than a confirmatory test. The core claim that correct recall indicates foveation rests primarily on the multi-digit task reducing random success probability to 10^{-N} together with the anti-peripheral stimulus properties, not on inferential statistics from this small sample. In revision we now report raw per-participant accuracy rates, explicitly describe the study as exploratory, and add a limitations paragraph noting the absence of acuity controls and the need for larger-scale validation. revision: partial
Referee: [RING condition / Results] RING eccentricity condition: while the design shows reduced peripheral readability at low opacity, the N=3 results are reported without per-participant breakdowns or tests for generalizability, leaving the separation between conditions vulnerable to idiosyncratic effects rather than establishing a robust anti-peripheral property.

Authors: We accept that the original submission omitted per-participant breakdowns, which weakens assessment of consistency. The revised manuscript now includes a table presenting individual participant accuracy for the RING versus foveal conditions at each opacity level, revealing a consistent directional pattern. While N=3 precludes claims of broad generalizability, the observed separation supports the stimulus design choices. We have updated the discussion to state that these results are preliminary and that larger studies are required to confirm the anti-peripheral property. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical formative study with direct observational support

full rationale

The paper proposes GazeCode as a recall-based verification method for mobile gaze data, supported by a small N=3 formative study that directly tests peripheral readability via the RING eccentricity condition. No equations, fitted parameters, predictions, or self-citations appear in the derivation chain. The inference that correct recall indicates foveation rests on observed separation between conditions rather than any reduction to inputs by construction. This is a standard empirical design contribution with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The method depends on the domain assumption that recall accuracy serves as a proxy for foveation and on the empirical tuning of stimulus parameters.

free parameters (2)

digit opacity
Probed in formative study to balance usability and anti-peripheral effect
digit duration
Probed in formative study to balance usability and anti-peripheral effect

axioms (1)

domain assumption Correct recall of multi-digit code requires foveation of the target
Core premise linking task success to gaze label validity

pith-pipeline@v0.9.0 · 5517 in / 1178 out tokens · 48518 ms · 2026-05-14T21:53:51.773826+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

recall-based verification paradigm ... multi-digit recall task (reducing random success to 10^{-N}) paired with anti-peripheral stimulus design (small, low-contrast, brief digits)
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

RING: four fixation dots appear on a ring around the bubble ... directly manipulating eccentricity

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 1 internal anchor

[1]

Riku Arakawa, Mayank Goel, Chris Harrison, and Karan Ahuja. 2022. Rgbdgaze: Gaze tracking on smartphones with rgb and depth data. InProceedings of the 2022 International Conference on Multimodal Interaction. 329–336

work page 2022
[2]

Jeroen S Benjamins, Roy S Hessels, and Ignace TC Hooge. 2018. GazeCode: Open- source software for manual mapping of mobile eye-tracking data. InProceedings of the 2018 ACM symposium on eye tracking research & applications. 1–4

work page 2018
[3]

Birgitta Burger, Anna Puupponen, and Tommi Jantunen. 2018. Synchronizing eye tracking and optical motion capture: How to bring them together.Journal of eye movement research11, 2 (2018), 10–16910

work page 2018
[4]

Zhang Cheng and Yanxia Wang. 2024. Lightweight Gaze Estimation Model Via Fusion Global Information. In2024 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8

work page 2024
[5]

Claudia Damiano and Dirk B Walther. 2019. Distinct roles of eye movements during memory encoding and retrieval.Cognition184 (2019), 119–129

work page 2019
[6]

Donald V DeRosa and Robert E Morin. 1970. Recognition reaction time for digits in consecutive and nonconsecutive memorized sets.Journal of Experimental Psychology83, 3p1 (1970), 472

work page 1970
[7]

Mayar Elfares, Pascal Reisert, Ralf Küsters, and Andreas Bulling. 2025. QualitEye: Public and Privacy-preserving Gaze Data Quality Verification.arXiv preprint arXiv:2506.05908(2025)

work page arXiv 2025
[8]

Shreya Ghosh, Abhinav Dhall, Munawar Hayat, Jarrod Knibbe, and Qiang Ji

work page
[9]

Automatic gaze analysis: A survey of deep learning based approaches.IEEE Transactions on Pattern Analysis and Machine Intelligence46, 1 (2023), 61–84

work page 2023
[10]

2000.The Java language specification

James Gosling. 2000.The Java language specification. Addison-Wesley Profes- sional

work page 2000
[11]

Jun Han Yio and John L Santa. 1970. Reaction time in short-term recognition with digits and letters.Psychonomic Science20, 2 (1970), 121–122

work page 1970
[12]

Shijing He, Yaxiong Lei, Zihan Zhang, Yuzhou Sun, Shujun Li, Chi Zhang, and Juan Ye. 2025. Identity deepfake threats to biometric authentication systems: Public and expert perspectives.arXiv preprint arXiv:2506.06825(2025)

work page arXiv 2025
[13]

Pei-Yun Hsueh, Prem Melville, and Vikas Sindhwani. 2009. Data quality from crowdsourcing: a study of annotation selection criteria. InProceedings of the NAACL HLT 2009 workshop on active learning for natural language processing. 27–35

work page 2009
[14]

John Jonides, Richard L Lewis, Derek Evan Nee, Cindy A Lustig, Marc G Berman, and Katherine Sledge Moore. 2008. The mind and brain of short-term memory. Annu. Rev. Psychol.59, 1 (2008), 193–224

work page 2008
[15]

Christina Katsini, Yasmeen Abdrabou, George E Raptis, Mohamed Khamis, and Florian Alt. 2020. The role of eye gaze in security and privacy applications: Survey and future HCI research directions. InProceedings of the 2020 CHI conference on human factors in computing systems. 1–21

work page 2020
[16]

Petr Kellnhofer, Adria Recasens, Simon Stent, Wojciech Matusik, and Antonio Torralba. 2019. Gaze360: Physically unconstrained gaze estimation in the wild. InProceedings of the IEEE/CVF international conference on computer vision. 6912– 6921

work page 2019
[17]

Kyle Krafka, Aditya Khosla, Petr Kellnhofer, Harini Kannan, Suchendra Bhan- darkar, Wojciech Matusik, and Antonio Torralba. 2016. Eye tracking for everyone. InProceedings of the IEEE conference on computer vision and pattern recognition. 2176–2184

work page 2016
[18]

Yaxiong Lei, Xinya Gong, Shijing He, Yafei Wang, Mohamed Khamis, and Juan Ye. 2026. The People’s Gaze: Co-Designing and Refining Gaze Gestures with Users and Experts. InProceedings of the 2026 CHI conference on human factors in computing systems

work page 2026
[19]

Yaxiong Lei, Shijing He, Huining Feng, Kaixing Zhao, Mohamed Khamis, and Juan Ye. 2023. Protecting Privacy in an Era of Pervasive Camera-Based Devices: Challenges and Potential Directions. InProc. UK Mobile, Wearable and Ubiquitous Systems Research Symposium

work page 2023
[20]

Yaxiong Lei, Shijing He, Mohamed Khamis, and Juan Ye. 2023. An end-to-end review of gaze estimation and its interactive applications on handheld mobile devices.Comput. Surveys56, 2 (2023), 1–38

work page 2023
[21]

Yaxiong Lei, Yuheng Wang, Fergus Buchanan, Mingyue Zhao, Yusuke Sugano, Shijing He, Mohamed Khamis, and Juan Ye. 2025. Quantifying the impact of motion on 2d gaze estimation in real-world mobile interactions.arXiv preprint arXiv:2502.10570(2025)

work page arXiv 2025
[22]

Yaxiong Lei, Yuheng Wang, Tyler Caslin, Alexander Wisowaty, Xu Zhu, Mohamed Khamis, and Juan Ye. 2023. DynamicRead: Exploring robust gaze interaction methods for reading on handheld mobile devices under dynamic conditions. Proceedings of the ACM on Human-Computer Interaction7, ETRA (2023), 1–17

work page 2023
[23]

Yaxiong Lei, Mingyue Zhao, Yuheng Wang, Shijing He, Yusuke Sugano, Yafei Wang, Kaixing Zhao, Mohamed Khamis, and Juan Ye. 2025. MAC-Gaze: Motion-Aware Continual Calibration for Mobile Gaze Tracking.arXiv preprint arXiv:2505.22769(2025)

work page arXiv 2025
[24]

Dongze Lian, Ziheng Zhang, Weixin Luo, Lina Hu, Minye Wu, Zechao Li, Jingyi Yu, and Shenghua Gao. 2019. RGBD based gaze estimation via multi-task CNN. InProceedings of the AAAI conference on artificial intelligence, Vol. 33. 2488–2495

work page 2019
[25]

Chiron AT Oderkerk and Sofie Beier. 2022. Fonts of wider letter shapes improve letter recognition in parafovea and periphery.Ergonomics65, 5 (2022), 753–761

work page 2022
[26]

John Paul Plummer, Alex Chaparro, and Rui Ni. 2022. Effect of target contrast and divided attention on the useful field of view.Vision research197 (2022), 108050

work page 2022
[27]

Camilla Funch Staugaard, Anders Petersen, and Signe Vangkilde. 2016. Eccen- tricity effects in vision and attention.Neuropsychologia92 (2016), 69–78

work page 2016
[28]

Emma EM Stewart, Matteo Valsecchi, and Alexander C Schütz. 2020. A review of interactions between peripheral and foveal vision.Journal of vision20, 12 (2020), 2–2

work page 2020
[29]

Hans Strasburger, Ingo Rentschler, and Martin Jüttner. 2011. Peripheral vision and pattern recognition: A review.Journal of vision11, 5 (2011), 13–13. CHI EA ’26, April 13–17, 2026, Barcelona, Spain Yaxiong Lei et al

work page 2011
[30]

MJ Taylor, RHS Carpenter, and AJ Anderson. 2006. A noisy transform predicts saccadic and manual reaction times to changes in contrast.The Journal of Physiology573, 3 (2006), 741–751

work page 2006
[31]

Nachiappan Valliappan, Na Dai, Ethan Steinberg, Junfeng He, Kantwon Rogers, Venky Ramachandran, Pingmei Xu, Mina Shojaeizadeh, Li Guo, Kai Kohlhoff, et al. 2020. Accelerating eye movement research via accurate and affordable smartphone eye tracking.Nature communications11, 1 (2020), 4553

work page 2020
[32]

Abinaya Priya Venkataraman, Peter Lewis, Peter Unsbo, and Linda Lundström

work page
[33]

Vision Research133 (2017), 145–149

Peripheral resolution and contrast sensitivity: effects of stimulus drift. Vision Research133 (2017), 145–149

work page 2017
[34]

Pingmei Xu, Krista A Ehinger, Yinda Zhang, Adam Finkelstein, Sanjeev R Kulka- rni, and Jianxiong Xiao. 2015. Turkergaze: Crowdsourcing saliency with webcam based eye tracking.arXiv preprint arXiv:1504.06755(2015)

work page internal anchor Pith review Pith/arXiv arXiv 2015
[35]

Songzhou Yang, Meng Jin, and Yuan He. 2022. Continuous gaze tracking with implicit saliency-aware calibration on mobile devices.IEEE Transactions on Mobile Computing22, 10 (2022), 5816–5828

work page 2022
[36]

Mingtao Yue, Tomomi Sayuda, Miles Pennington, and Yusuke Sugano. 2025. Eval- uating user experience and data quality in gamified data collection for appearance- based gaze estimation.International Journal of Human–Computer Interaction41, 12 (2025), 7549–7565

work page 2025
[37]

Xucong Zhang, Seonwook Park, Thabo Beeler, Derek Bradley, Siyu Tang, and Otmar Hilliges. 2020. Eth-xgaze: A large scale dataset for gaze estimation under extreme head pose and gaze variation. InEuropean conference on computer vision. Springer, 365–381

work page 2020
[38]

Xucong Zhang, Yusuke Sugano, Mario Fritz, and Andreas Bulling. 2017. Mpiigaze: Real-world dataset and deep appearance-based gaze estimation.IEEE transactions on pattern analysis and machine intelligence41, 1 (2017), 162–175

work page 2017