pith. sign in

arxiv: 2606.20844 · v1 · pith:KTIT2IKDnew · submitted 2026-06-18 · 🧬 q-bio.NC

Relational Gaze Transitions During Encoding Predict Episodic Recall of Naturalistic Scenes

Pith reviewed 2026-06-26 14:47 UTC · model grok-4.3

classification 🧬 q-bio.NC
keywords eye trackingepisodic memoryscene perceptionrelational processinggaze transitionsnaturalistic scenesencodingfree recall
0
0 comments X

The pith

Relational gaze transitions during first viewing of naturalistic scenes predict later free recall of objects and relations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether eye movements that shift between meaningfully related objects in complex scenes serve as a behavioral marker of how the brain organizes details into memorable events. It applies scene-graph labels to eye-tracking recordings to quantify these relational transitions both while participants first view the scenes and later when they retrieve them from memory on a blank screen. Relational scanning during initial encoding reliably forecasts success in recalling both individual objects and the links between them, even after statistical controls for low-level visual salience, number of fixations, semantic content, and overall image properties. In contrast, the same relational gaze measure during retrieval does not forecast recall accuracy. The findings position relational gaze as functionally important for memory formation rather than for retrieval itself.

Core claim

By annotating naturalistic scenes with graphs that link objects according to real-world relations, the study measures how often gaze moves between connected nodes during encoding. Participants exhibit above-chance relational gaze both at initial viewing and during blank-screen retrieval. The frequency of these encoding-phase transitions correlates with subsequent free-recall performance for object identities and for the relations themselves, surviving controls for salience, fixation count, meaning, and image-level variance. Retrieval-phase relational gaze shows no such predictive relation, indicating that the organizational process tracked by gaze is most critical while the memory is being l

What carries the argument

Scene-graph annotations applied to eye-tracking data to quantify relational gaze transitions (gaze shifts between meaningfully connected objects).

If this is right

  • Relational gaze during encoding contributes to binding object details into coherent episodic memories.
  • The same gaze measure can be extracted from complex, real-world scenes rather than only simplified displays.
  • Relational organization occurs during initial exposure rather than during later retrieval attempts.
  • Gaze-based metrics may index successful memory formation independently of low-level visual features.
  • The approach extends measurement of relational processing to naturalistic viewing conditions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If relational gaze marks encoding success, training or guiding such transitions could improve memory in applied settings such as education or eyewitness testimony.
  • The dissociation between encoding and retrieval phases suggests that interventions timed to initial exposure may be more effective than those applied at test.
  • Future work could test whether disrupting relational gaze patterns during viewing selectively impairs relational memory while sparing item memory.
  • The method may generalize to dynamic video scenes if scene graphs can be extended over time.

Load-bearing premise

The scene graphs accurately represent the object relations that participants actually process while viewing the scenes.

What would settle it

A replication in which relational gaze at encoding no longer predicts recall once scene graphs are replaced by random or purely spatial object pairings.

read the original abstract

Remembering a visual scene requires organizing distinct details into a cohesive event. This study investigates whether relation-guided gaze transitions provide a behavioural marker of this cognitive organization during episodic encoding and retrieval. By applying scene graph annotations to eye-tracking data, we measured whether gaze moved between objects that were meaningfully related within complex scenes. This approach allowed us to quantify relational scanning within naturalistic environments, moving beyond prior methods that relied on simplified displays or isolated relation types. Participants showed above-chance relational gaze during both initial viewing and blank-screen retrieval, indicating that gaze actively tracks scene structure during first viewing and at recall. Additionally, relational scanning at encoding predicted subsequent free recall of both object and relational details, even after accounting for salience, fixation frequency, meaning, and image-level differences. In contrast, relational scanning at retrieval did not predict recall success, suggesting that relational gaze is most functional to memory during its formation. Together, these findings show that relational gaze can be measured in complex scenes and may serve as a marker of episodic encoding during natural visual exploration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that applying scene-graph annotations to eye-tracking data from participants viewing naturalistic scenes reveals above-chance relational gaze transitions (between meaningfully connected objects) during both encoding and blank-screen retrieval. Relational scanning at encoding predicts subsequent free recall of both object and relational details, even after statistical controls for salience, fixation frequency, meaning, and image-level differences; the same measure at retrieval does not predict recall success. The authors interpret relational gaze as a behavioral marker of episodic encoding in complex, real-world scenes.

Significance. If the central empirical result is robust, the work supplies a measurable, naturalistic index of relational organization during memory formation that goes beyond simplified displays or single relation types. The encoding-versus-retrieval dissociation and the reported controls are potentially informative for models linking visual exploration to episodic memory.

major comments (2)
  1. [Methods] Methods (scene-graph construction and application): the central claim that relational gaze indexes cognitive organization relevant to encoding rests on the untested assumption that static, annotator-derived scene-graph edges correspond to the relations participants actually process during viewing. No participant validation, salience-weighted edge analysis, or comparison against alternative relational annotations is described to rule out the possibility that the predictive effect is driven by low-level co-occurrence or annotator bias rather than memory-relevant structure.
  2. [Results] Results (control analyses): although the abstract states that the encoding prediction survives controls for salience, fixation frequency, meaning, and image-level differences, the manuscript does not report the precise operationalization of these covariates, the model specifications, or effect-size changes after each control. Without these details it is impossible to evaluate whether the relational-gaze term remains load-bearing once all plausible confounds are entered.
minor comments (2)
  1. [Abstract] The abstract and introduction should explicitly define 'relational scanning' (e.g., proportion of transitions between graph-connected objects versus total transitions) and state the chance baseline used for the above-chance claim.
  2. [Figures] Figure legends and methods should clarify how blank-screen retrieval trials were aligned with the original scene graphs for the relational-gaze measure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We respond to each major comment below, indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Methods] Methods (scene-graph construction and application): the central claim that relational gaze indexes cognitive organization relevant to encoding rests on the untested assumption that static, annotator-derived scene-graph edges correspond to the relations participants actually process during viewing. No participant validation, salience-weighted edge analysis, or comparison against alternative relational annotations is described to rule out the possibility that the predictive effect is driven by low-level co-occurrence or annotator bias rather than memory-relevant structure.

    Authors: We acknowledge that the scene-graph annotations are static and annotator-derived without direct participant validation of the specific relations processed during viewing. While these annotations follow standard protocols from the scene-understanding literature and the reported effects survive controls for low-level factors, we agree this leaves open the possibility of annotator bias or co-occurrence driving the results. In revision we will add a limitations paragraph explicitly discussing this assumption, report inter-annotator agreement statistics for the scene graphs, and outline potential future validation approaches. No new participant data collection is feasible at this stage. revision: partial

  2. Referee: [Results] Results (control analyses): although the abstract states that the encoding prediction survives controls for salience, fixation frequency, meaning, and image-level differences, the manuscript does not report the precise operationalization of these covariates, the model specifications, or effect-size changes after each control. Without these details it is impossible to evaluate whether the relational-gaze term remains load-bearing once all plausible confounds are entered.

    Authors: The control analyses appear in the Results, but we agree that the precise operational definitions, full model specifications, and stepwise effect-size changes are not reported with sufficient detail. In the revised manuscript we will expand the Methods and Results sections to define each covariate explicitly (including how salience maps, meaning ratings, and image-level factors were quantified), provide the complete mixed-effects regression equations, and add supplementary tables showing coefficient estimates and effect sizes before versus after each successive control. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper reports an empirical eye-tracking study that applies pre-existing scene graph annotations to measure relational gaze transitions in naturalistic scenes, then tests whether those transitions statistically predict subsequent free recall after controlling for salience, fixation frequency, meaning, and image-level factors. All load-bearing steps consist of data collection, annotation application, and regression analyses on observed participant behavior; none reduce by definition or self-citation to the target outcome. The derivation chain is therefore self-contained against external benchmarks (recall performance) and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract mentions no free parameters, invented entities, or non-standard axioms. Standard statistical assumptions for regression-based prediction are implicit but not detailed.

axioms (1)
  • domain assumption Regression models can isolate the unique contribution of relational gaze after controlling for salience, fixation frequency, meaning, and image-level factors
    Invoked when stating that the prediction holds after accounting for those variables.

pith-pipeline@v0.9.1-grok · 5708 in / 1131 out tokens · 23240 ms · 2026-06-26T14:47:59.001500+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 3 canonical work pages

  1. [1]

    https://doi.org/10.1016/j.neuron.2017.06.036 Federico, G., & Brandimonte, M. A. (2019). Tool and object affordances: An ecological eye- tracking study. Brain and Cognition, 135, Article 103582. https://doi.org/10.1016/j.bandc.2019.103582 Fehlmann, B., Coynel, D., Schicktanz, N., Milnik, A., Gschwind, L., Hofmann, P., Papassotiropoulos, A., & de Quervain, ...

  2. [2]

    eye movements to nothing

    https://doi.org/10.1037/a0014420 Radvansky, G. A., & Zacks, J. M. (2014). Event cognition. Oxford University Press. Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124(3), 372–422. https://doi.org/10.1037/0033-2909.124.3.372 Rust, N. C., & Mehrpour, V. (2020). Understanding image memora...

  3. [3]

    Episodic memory: From mind to brain.Annual Review of Psychology, 53(1):1–25, 2002

    https://doi.org/10.1146/annurev.psych.53.100901.135114 Vestner, T., Flavell, J. C., Cook, R., & Tipper, S. P. (2022). Remembered together: Social interaction facilitates retrieval while reducing individuation of features within bound representations. Quarterly Journal of Experimental Psychology, 75(9), 1593–1602. https://doi.org/10.1177/17470218211056499 ...

  4. [4]

    correct". - If the concept is a participant hallucination — not in the image and not reasonably inferred — set status to

    VERIFY STATUS Look at the image carefully. For each node: - If the concept is physically present or visually deducible (latent), set status to "correct". - If the concept is a participant hallucination — not in the image and not reasonably inferred — set status to "incorrect". - Override the draft status whenever the image contradicts it. Example: if part...

  5. [5]

    Person" Fix) Review source_phrases for broad nodes where participants described mutually exclusive things (e.g., one said

    SPLIT CONTRADICTIONS (The "Person" Fix) Review source_phrases for broad nodes where participants described mutually exclusive things (e.g., one said "blonde man", another said "woman" for the same person node). - Split these into distinct nodes. - Create one node for the visually verified truth (e.g., concept: "woman", status: "correct"). - Create separat...

  6. [6]

    a", "b",

    PRESERVE IDs - If you keep a node unchanged, preserve its original node_id exactly. - If you split a node into two or more, use suffix notation: original_id + "a", "b", "c" (e.g., "2387122_002a"). - New nodes you add that have no original counterpart should use the pattern: "{StimID}_{next_available_index}"

  7. [7]

    dog" → [

    PRESERVE FIELDS - Every output node must have exactly these fields: node_id, concept, context, content_type, evidence_type, status, source_phrases. - Do not add or remove fields. - source_phrases: for each participant who mentioned this concept, extract the 1-2 words immediately surrounding the concept that confirm the match. Do NOT copy full sentences. E...

  8. [8]

    Only return nodes where the concept WAS recalled

    RECALL DECISION For each node in the Codebook, decide whether the participant's response contains this concept. Only return nodes where the concept WAS recalled. Omit nodes that were not recalled. Nodes absent from your response will automatically be scored as recalled=false

  9. [9]

    kitty" matches concept

    SEMANTIC MATCHING (not keyword matching) Match on meaning, not exact words. Examples: - "kitty" matches concept "cat" - "typing" in the context of "typing on keyboard" matches concept "typing" - "sitting on something blue" matches concept "blue" AND concept "on" (spatial) - "furry animal" does NOT match concept "cat" — too vague, could be any animal Use j...

  10. [10]

    MATCHED PHRASE For each recalled node, copy the shortest phrase from the response that triggered the match

  11. [11]

    incorrect

    ONLY SCORE CORRECT NODES Nodes with status "incorrect" in the Codebook are hallucinated concepts — do not return them even if the participant mentions them

  12. [12]

    node_id":

    PRESERVE NODE IDs Use node_id values exactly as they appear in the Codebook. Do not add or invent node IDs. Return ONLY a valid JSON array of recalled nodes. No preamble, no explanation, no markdown fences. Each element must have exactly: node_id (string), matched_phrase (string). If no nodes were recalled, return an empty array: [] Appendix F: Excerpted ...