pith. machine review for the scientific record. sign in

arxiv: 2601.17622 · v4 · submitted 2026-01-24 · 💻 cs.HC · cs.CL· cs.IR

Recognition: 2 theorem links

· Lean Theorem

Memento: Towards Proactive Visualization of Everyday Memories with Personal Wearable AR Assistant

Authors on Pith no claims yet

Pith reviewed 2026-05-16 10:46 UTC · model grok-4.3

classification 💻 cs.HC cs.CLcs.IR
keywords augmented realityproactive assistantcontext-aware computingwearable ARmemory visualizationconversational interfaceeveryday memories
0
0 comments X

The pith

Memento stores spoken queries with their time, place and activity contexts so an AR assistant can proactively recall and visualize them during matching daily situations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Memento, a conversational wearable AR system that permanently records each user query together with the surrounding spatiotemporal and activity details. These stored memories allow the system to identify recurring personal interests and the specific contexts that trigger them. When similar contexts reappear, Memento automatically surfaces up-to-date, context-relevant responses through AR overlays, turning isolated interactions into a continuous, long-term memory aid. A small user study collected feedback from participants with varying AR experience to surface practical challenges in keeping such assistance helpful rather than intrusive.

Core claim

Memento permanently captures and memorizes user's verbal queries alongside their spatiotemporal and activity contexts. By storing these memories, it discovers connections between recurring interests and the contexts that trigger them. Upon detection of similar contexts, Memento proactively recalls user interests and delivers up-to-date responses through AR, creating connected long-term interactions tailored to the user's multimodal context instead of treating each query as a one-off event.

What carries the argument

Persistent memory storage that links each verbal query to its spatiotemporal and activity context, enabling context-triggered proactive AR recall.

If this is right

  • AR assistance shifts from reactive commands to persistent, context-driven support across days or weeks.
  • Daily routines gain seamless integration of personalized information without explicit user requests.
  • Design trade-offs emerge around accuracy, timing, and intrusiveness of proactive visualizations.
  • Long-term memory augmentation becomes possible by accumulating and reusing personal query history.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same memory-linking approach could extend to non-verbal cues such as eye gaze or body posture to trigger recall.
  • Privacy and data-retention rules would need explicit handling because the system continuously logs personal contexts.
  • Over time, the accumulated memories might support trend analysis or personal knowledge graphs beyond immediate AR delivery.

Load-bearing premise

Recurring user interests are reliably triggered by detectable spatiotemporal and activity contexts, and proactive AR recall will feel helpful rather than intrusive or inaccurate.

What would settle it

A field deployment in which users report that Memento frequently surfaces irrelevant or outdated information during routine activities, or that the proactive overlays interrupt their tasks more than they help.

Figures

Figures reproduced from arXiv: 2601.17622 by Arie E. Kaufman, Yalong Yang, Yoonsang Kim.

Figure 1
Figure 1. Figure 1: Concept illustration of Memento. The user’s prior queries are stored in the form of “memories”, along with their referent, [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The view of situated AR in Memento. A physical refer [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Memento pipeline overview: With a verbal query, the visual, head-gaze, and the transcribed textual query from the user are sent to [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: User interfaces of Memento. (A) Example visualization of [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The three real-world use-cases of Memento. Upon the contextual alignment of the user’s current spatial activity (Referent, Space, [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
read the original abstract

We introduce Memento, a conversational AR assistant that permanently captures and memorizes user's verbal queries alongside their spatiotemporal and activity contexts. By storing these "memories," Memento discovers connections between users' recurring interests and the contexts that trigger them. Upon detection of similar or identical spatiotemporal activity, Memento proactively recalls user interests and delivers up-to-date responses through AR, seamlessly integrating AR experience into their daily routine. Unlike prior work, each interaction in Memento is not a transient event, but a connected series of interactions with coherent long--term perspective, tailored to the user's broader multimodal (visual, spatial, temporal, and embodied) context. We conduct a preliminary evaluation through user feedbacks with participants of diverse expertise in immersive apps, and explore the value of proactive context-aware AR assistant in everyday settings. We share our findings and challenges in designing a proactive, context-aware AR system.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Memento, a conversational wearable AR assistant that permanently stores user verbal queries together with their spatiotemporal and activity contexts. It claims to discover connections between recurring interests and triggering contexts, then proactively recall and deliver updated responses via AR upon detecting matching contexts. This is positioned as creating coherent long-term interaction series (unlike prior transient systems) that are tailored to the user's multimodal context; support comes from a preliminary user feedback study with participants of diverse immersive-app expertise.

Significance. If the context-matching and proactive-recall components can be shown to operate with acceptable precision and without excessive intrusiveness, the work would offer a concrete step toward persistent, memory-augmented AR assistants that integrate into everyday routines. The emphasis on multimodal (visual, spatial, temporal, embodied) context and the explicit contrast with transient prior systems are potentially valuable contributions, though the current preliminary feedback provides only weak grounding for these advantages.

major comments (3)
  1. [Evaluation] Evaluation section: the preliminary user feedback study is described only at a high level; no participant count, study duration, data-collection protocol, or quantitative metrics (e.g., context-matching precision, recall accuracy, or Likert-scale ratings of helpfulness vs. intrusiveness) are reported. Without these, the support for the central claim of discovering connections and delivering proactive value remains unverifiable.
  2. [Abstract / Introduction] Abstract and §1: the assertion that 'each interaction in Memento is not a transient event, but a connected series of interactions with coherent long-term perspective' is presented as a distinguishing feature, yet no longitudinal data, false-positive rates for context matching, or evidence of sustained user perception of coherence are supplied.
  3. [System Overview] System description: the assumption that recurring interests are reliably triggered by detectable spatiotemporal/activity contexts (and that proactive AR recall will be experienced as helpful rather than intrusive) is stated without any reported implementation details or validation of the matching algorithm's precision over time.
minor comments (2)
  1. [Abstract] Abstract contains the typographical string 'long--term' (double dash); standardize to 'long-term'.
  2. [Figures] Figure captions and system diagrams would benefit from explicit labels indicating which components handle context storage versus proactive retrieval.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive feedback on our preliminary work. We agree that the evaluation description is high-level and that some claims require qualification given the early stage of the research. We will revise the manuscript to address these points and provide additional details where available from the study.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: the preliminary user feedback study is described only at a high level; no participant count, study duration, data-collection protocol, or quantitative metrics (e.g., context-matching precision, recall accuracy, or Likert-scale ratings of helpfulness vs. intrusiveness) are reported. Without these, the support for the central claim of discovering connections and delivering proactive value remains unverifiable.

    Authors: We agree that the Evaluation section requires expansion for verifiability. In the revision we will add the participant count, study duration, data-collection protocol, and available feedback metrics including Likert-scale ratings on helpfulness and intrusiveness. We will also explicitly note that quantitative precision/recall metrics for context matching were not computed in this exploratory study and remain future work. revision: yes

  2. Referee: [Abstract / Introduction] Abstract and §1: the assertion that 'each interaction in Memento is not a transient event, but a connected series of interactions with coherent long-term perspective' is presented as a distinguishing feature, yet no longitudinal data, false-positive rates for context matching, or evidence of sustained user perception of coherence are supplied.

    Authors: We acknowledge the claim is aspirational. The preliminary feedback indicated positive user perception of connected interactions, but we lack longitudinal data. We will revise the Abstract and §1 to frame the statement as the system's design intent and initial user reactions, while adding an explicit limitations paragraph on the absence of longitudinal evidence and false-positive rates. revision: partial

  3. Referee: [System Overview] System description: the assumption that recurring interests are reliably triggered by detectable spatiotemporal/activity contexts (and that proactive AR recall will be experienced as helpful rather than intrusive) is stated without any reported implementation details or validation of the matching algorithm's precision over time.

    Authors: We will expand the System Overview with additional implementation specifics on context encoding, storage, and matching logic. We will also report user-study observations on perceived helpfulness versus intrusiveness of proactive recalls to better ground the assumptions, while noting that long-term precision validation is planned for future deployments. revision: yes

standing simulated objections not resolved
  • Absence of longitudinal data and quantitative precision metrics for context matching, which would require extended real-world deployment studies beyond the scope of this preliminary evaluation.

Circularity Check

0 steps flagged

No circularity: system proposal without derivations or self-referential reductions

full rationale

The paper introduces a wearable AR system concept for capturing verbal queries with spatiotemporal/activity contexts and proactively recalling them on matching contexts. It describes the architecture, memory storage, and proactive recall mechanism, then reports preliminary qualitative user feedback. No equations, fitted parameters, predictions, or uniqueness theorems appear. The central claim that interactions form a 'connected series with coherent long-term perspective' is presented as a direct consequence of the described storage-and-recall design rather than derived from any prior result or self-citation chain. The distinction from transient prior work is asserted by construction of the new system, not by circular reduction. This is a standard honest non-finding for a design/proposal paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The proposal rests on domain assumptions about human behavior and context detection rather than new mathematical constructs or invented physical entities.

axioms (2)
  • domain assumption Users exhibit recurring interests that are triggered by repeatable spatiotemporal and activity contexts
    Invoked to justify the value of proactive recall; appears in the description of memory discovery and triggering.
  • domain assumption Context detection from wearable sensors is sufficiently accurate to identify similar situations reliably
    Required for the proactive mechanism to function without excessive false positives or misses.

pith-pipeline@v0.9.0 · 5455 in / 1346 out tokens · 41301 ms · 2026-05-16T10:46:40.097263+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. VisionClaw: Always-On AI Agents through Smart Glasses

    cs.HC 2026-04 unverdicted novelty 5.0

    VisionClaw couples continuous egocentric vision on smart glasses with speech-driven AI agents to enable hands-free real-world tasks, with lab and field studies showing faster completion and a shift toward opportunisti...

  2. Exploring Experiential Differences Between Virtual and Physical Memory-Linked Objects in Extended Reality

    cs.HC 2026-03 unverdicted novelty 4.0

    A study of 24 users finds that physical and virtual memory-linked objects in XR support stronger social connection and engagement than conventional gallery interfaces for sharing personal memories.

Reference graph

Works this paper leans on

80 extracted references · 80 canonical work pages · cited by 2 Pith papers · 2 internal anchors

  1. [1]

    M. B. Almourad, M. Hussein, E. Bataineh, and Z. Wattar. Explor- ing smartphone usage dynamics: Unveiling app-specific patterns and trends.IJWI, 22(2), 2024. 2

  2. [2]

    Andolina, V

    S. Andolina, V . Orso, H. Schneider, K. Klouche, T. Ruotsalo, L. Gam- berini, and G. Jacucci. Searchbot: Supporting voice conversations with proactive search. InProc. of ACM CSCW, pp. 9–12, 2018. 3

  3. [3]

    Arakawa, J

    R. Arakawa, J. F. Lehman, and M. Goel. Prism-q&a: Step-aware voice assistant on a smartwatch enabled by multimodal procedure tracking and large language models. InProc. ACM IMWUT, vol. 8, pp. 1–26,

  4. [4]

    A. Asai, S. Min, Z. Zhong, and D. Chen. Retrieval-based language models and applications. InProc. of ACL, pp. 41–46, 2023. 2

  5. [5]

    R. T. Azuma. The road to ubiquitous consumer augmented reality systems.HBET, 1(1):26–32, 2019. 2, 3, 6

  6. [6]

    Barfield.Fundamentals of wearable computers and augmented reality

    W. Barfield.Fundamentals of wearable computers and augmented reality. CRC Press, 2015. 2

  7. [7]

    Bressa, J

    N. Bressa, J. Vermeulen, and W. Willett. Data every day: Designing and living with personal situated visualizations. InProc. of ACM CHI, pp. 1–18, 2022. 2, 6

  8. [8]

    R. Cai, N. Janaka, H. Kim, Y . Chen, S. Zhao, Y . Huang, and D. Hsu. Aiget: Transforming everyday moments into hidden knowledge dis- covery with ai assistance on smart glasses.arXiv:2501.16240, 2025. 6

  9. [9]

    Chang, Y

    R.-C. Chang, Y . Liu, and A. Guo. Worldscribe: Towards context- aware live visual descriptions. InProc. of ACM UIST, pp. 1–18, 2024. 1, 2

  10. [10]

    W. Chen, H. Hu, X. Chen, P. Verga, and W. Cohen. Murag: Mul- timodal retrieval-augmented generator for open question answering over images and text. InProc. of EMNLP, pp. 5558–5570, 2022. 2

  11. [11]

    Cheng, L

    T. Cheng, L. Song, Y . Ge, W. Liu, X. Wang, and Y . Shan. Yolo-world: Real-time open-vocabulary object detection. InProc. of IEEE/CVF CVPR, 2024. 3

  12. [12]

    K. Choe, C. Lee, S. Lee, J. Song, A. Cho, N. W. Kim, and J. Seo. Enhancing data literacy on-demand: LLMs as guides for novices in chart interpretation.IEEE TVCG, 2024. 2

  13. [13]

    W. Cui, X. Zhang, Y . Wang, H. Huang, B. Chen, L. Fang, H. Zhang, J. Lou, and D. Zhang. Text-to-Viz: automatic generation of infograph- ics from proportion-related natural language statements.IEEE TVCG, 26(1):906–916, 2020. 2

  14. [14]

    De La Torre, C

    F. De La Torre, C. M. Fang, H. Huang, A. Banburski-Fahey, J. Amores Fernandez, and J. Lanier. Llmr: Real-time prompting of interactive worlds using large language models. InProc. of CHI, pp. 1–22, 2024. 2

  15. [15]

    T. Deng, S. Kanthawala, J. Meng, W. Peng, A. Kononova, Q. Hao, Q. Zhang, and P. David. Measuring smartphone usage and task switch- ing with log tracking and self-reports.MMC, 7(1):3–23, 2019. 2

  16. [16]

    M. D. Dogan, E. J. Gonzalez, K. Ahuja, R. Du, A. Colac ¸o, J. Lee, M. Gonzalez-Franco, and D. Kim. Augmented object intelligence with XR-Objects. InProc. of ACM UIST, pp. 1–15, 2024. 1, 2

  17. [17]

    D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Chao, A. Mody, S. Truitt, and J. Larson. From local to global: A graph rag approach to query- focused summarization.arXiv:2404.16130, 2024. 6

  18. [18]

    Evgrashin

    A. Evgrashin. Whisper for unity.https://github.com/Macoron/ whisper.unity/tree/master, 2024. Aug. 31. 2024. 3

  19. [19]

    C. M. Fang, Y . Samaradivakara, P. Maes, and S. Nanayakkara. Mi- rai: A wearable proactive ai” inner-voice” for contextual nudging. arXiv:2502.02370, 2025. 1, 2

  20. [20]

    GoogleAI

    Google. GoogleAI. Gemini models.https://ai.google.dev/ gemini-api/docs/models/. Mar. 21. 2025]. 3

  21. [21]

    Programmable search engine.https://developers

    Google. Programmable search engine.https://developers. google.com/custom-search/v1/overview, 2025. Mar. 23. 2025. 3

  22. [22]

    Grubert, T

    J. Grubert, T. Langlotz, S. Zollmann, and H. Regenbrecht. Towards pervasive augmented reality: Context-awareness in augmented reality. IEEE TVCG, 23(6):1706–1724, 2016. 3

  23. [23]

    A. Guttman. R-trees: a dynamic index structure for spatial searching. ACM SIGMOD, 14(2):47–57, 1984. 4

  24. [24]

    Han and K

    C. Han and K. E. Isaacs. A deixis-centered approach for documenting remote synchronous communication around data visualizations.IEEE TVCG, 31(1):930–940, 2024. 2

  25. [25]

    Harvey, M

    M. Harvey, M. Langheinrich, and G. Ward. Remembering through lifelogging: A survey of human memory augmentation.PMCJ, 27:14–26, 2016. 2

  26. [26]

    Huang, J

    Y . Huang, J. Xu, B. Pei, Y . He, G. Chen, L. Yang, X. Chen, Y . Wang, Z. Nie, J. Liu, G. Fan, D. Lin, F. Fang, K. Li, C. Yuan, Y . Wang, Y . Qiao, and L. Wang. Vinci: A real-time embodied smart assistant based on egocentric vision-language model.arXiv:2412.21080, 2024. 1, 2

  27. [27]

    What is IFTTT.https://ifttt.com/explore/new_to_ ifttt, 2025

    IFTTT. What is IFTTT.https://ifttt.com/explore/new_to_ ifttt, 2025. Apr. 4. 2025. 6

  28. [28]

    Kashmira, J

    S. Kashmira, J. L. Dantanarayana, J. Brodsky, A. Mahendra, Y . Kang, K. Flautner, L. Tang, and J. Mars. A graph-based approach for con- versational ai-driven personal memory capture and retrieval in a real- world application.arXiv:2412.05447, 2024. 2

  29. [29]

    M. S. U. Khan, M. Z. Afzal, and D. Stricker. SituationalLLM: proac- tive language models with scene awareness for dynamic, contextual task guidance.arXiv:2406.13302, 2024. 2

  30. [30]

    O. Khan, Z. Ahmed, H. Nam, and K. Kim. Tangiblemoments: Em- bedding XR memories onto physical objects. InProc. of IEEE VRW, pp. 1147–1153, 2025. 2

  31. [31]

    D. Kim, J. Gluck, M. Hall, and Y . Agarwal. Real world longitudinal ios app usage study at scale.arXiv:1912.12526, 2019. 2

  32. [32]

    Y . Kim, Z. Aamir, M. Singh, S. Boorboor, K. Mueller, and A. E. Kaufman. Explainable XR: understanding user behaviors of XR en- vironments using LLM-assisted analytics framework.IEEE TVCG, 31(5):1–11, 2025. 2

  33. [33]

    Y . Kim, S. Boorboor, A. Rahmati, and A. Kaufman. Design of privacy preservation system in augmented reality. InProc. of IEEE VizSec,

  34. [34]

    Y . Kim, S. Goutam, A. Rahmati, and A. Kaufman. Erebus: Access control for augmented reality systems. InProc. of USENIX Security, pp. 929–946, 2023. 6

  35. [35]

    Koutroumanis and C

    N. Koutroumanis and C. Doulkeridis. Scalable spatio-temporal index- ing and querying over a document-oriented nosql store. InProc. of EDBT, pp. 611–622, 2021. 2

  36. [36]

    B. Lee, M. Sedlmair, and D. Schmalstieg. Design patterns for situated visualization in augmented reality.IEEE TVCG, 30(1):1324–1335,

  37. [37]

    G. Lee, M. Xia, N. Numan, X. Qian, D. Li, Y . Chen, A. Kulshrestha, I. Chatterjee, Y . Zhang, D. Manocha, et al. Sensible agent: A frame- work for unobtrusive interaction with proactive ar agents. InProc. of ACM UIST, pp. 1–22, 2025. 2

  38. [38]

    J. Lee, J. Wang, E. Brown, L. Chu, S. S. Rodriguez, and J. E. Froehlich. GazePointAR: a context-aware multimodal voice assistant for pronoun disambiguation in wearable augmented reality. InProc. of ACM CHI, pp. 1–20, 2024. 1, 2, 3

  39. [39]

    J. Lee, T. Wang, J. Fashimpaur, N. Sendhilnathan, and T. R. Jonker. Walkie-talkie: Exploring longitudinal natural gaze, llms, and vlms for query disambiguation in xr. InProc. of ACM CHI EA, pp. 1–9, 2025. 6

  40. [40]

    Lewis, E

    P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt ¨aschel, et al. Retrieval- augmented generation for knowledge-intensive nlp tasks.NeurIPS, 33:9459–9474, 2020. 2

  41. [41]

    C. Li, G. Wu, G. Y .-Y . Chan, D. G. Turakhia, S. Castelo Quispe, D. Li, L. Welch, C. Silva, and J. Qian. Satori: Towards proactive ar assistant with belief-desire-intention user modeling. InProc. of ACM CHI, pp. 1–24, 2025. 2

  42. [42]

    J. N. Li, Y . Xu, T. Grossman, S. Santosa, and M. Li. OmniActions: predicting digital actions in response to real-world multimodal sen- sory inputs with LLMs. InProc. of ACM CHI, pp. 1–22, 2024. 1, 2 7

  43. [43]

    J. N. Li, Z. J. Zhang, and J. Ma. Omniquery: Contextually augmenting captured multimodal memory to enable personal question answering. InProc. of ACM CHI, 2025. 1

  44. [44]

    J. Liu, K. A. Satriadi, B. Ens, and T. Dwyer. Investigating the effects of physical landmarks on spatial memory for information visualisation in augmented reality. InProc. of IEEE ISMAR, pp. 289–298, 2024. 2

  45. [45]

    S. Liu, J. Xu, W. Tjangnaka, S. Semnani, C. Yu, and M. Lam. Suql: Conversational search over structured and unstructured data with large language models. InProc. of NAACL, pp. 4535–4555, 2024. 2

  46. [46]

    S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, Q. Jiang, C. Li, J. Yang, H. Su, et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. InProc. of ECCV, pp. 38– 55, 2024. 3

  47. [47]

    X. B. Liu, S. Fang, W. Shi, C.-S. Wu, T. Igarashi, and X. Chen. Proac- tive conversational agents with inner thoughts. InProc. of ACM CHI,

  48. [48]

    X. B. Liu, V . Kirilyuk, X. Yuan, A. Olwal, P. Chi, X. A. Chen, and R. Du. Visual captions: augmenting verbal communication with on- the-fly visuals. InProc. of ACM CHI, pp. 1–20, 2023. 3, 4

  49. [49]

    Lu and D

    F. Lu and D. A. Bowman. Evaluating the potential of glanceable ar in- terfaces for authentic everyday uses. InIEEE VR, pp. 768–777, 2021. 6

  50. [50]

    F. Lu, L. Pavanatto, and D. A. Bowman. In-the-wild experiences with an interactive glanceable ar system for everyday use. InProc. of ACM SUI, pp. 1–9, 2023. 6

  51. [51]

    Z. Lv, N. Charron, P. Moulon, A. Gamino, C. Peng, C. Sweeney, E. Miller, H. Tang, J. Meissner, J. Dong, et al. Aria everyday activities dataset.arXiv:2402.13349, 2024. 2

  52. [52]

    Y . A. Malkov and D. A. Yashunin. Efficient and robust approxi- mate nearest neighbor search using hierarchical navigable small world graphs.IEEE TPAMI, 42(4):824–836, 2018. 4

  53. [53]

    R. E. Mayer. Multimedia learning. InPsychology of learning and motivation, vol. 41, pp. 85–139. Elsevier, 2002. 6

  54. [54]

    Meurisch, C

    C. Meurisch, C. A. Mihale-Wilson, A. Hawlitschek, F. Giger, F. M¨uller, O. Hinz, and M. M ¨uhlh¨auser. Exploring user expectations of proactive ai systems.Proc. of ACM IMWUT, 4(4):1–22, 2020. 3, 4

  55. [55]

    Head-gaze and dwell.https://learn

    Microsoft. Head-gaze and dwell.https://learn. microsoft.com/en-us/windows/mixed-reality/design/ gaze-and-dwell-head. Mar. 23. 2025. 4

  56. [56]

    Morris, V

    C. Morris, V . Danry, and P. Maes. Wearable systems without expe- riential disruptions: exploring the impact of device feedback changes on explicit awareness, physiological synchrony, sense of agency, and device-body ownership.FCS, 5, 2023. 3

  57. [57]

    L. Ning, L. Liu, J. Wu, N. Wu, D. Berlowitz, S. Prakash, B. Green, S. O’Banion, and J. Xie. User-llm: Efficient llm contextualization with user embeddings. InProc. of ACM WWW, pp. 1219–1223, 2025. 6

  58. [58]

    OpenAI. OpenAI. Gpt-4v(ision) system card.https://cdn. openai.com/papers/GPTV_System_Card.pdf. Mar. 28. 2024]. 3

  59. [59]

    P ¨atzold, J

    B. P ¨atzold, J. Nogga, and S. Behnke. Leveraging vision-language models for open-vocabulary instance segmentation and tracking. arXiv:2503.16538, 2025. 5

  60. [60]

    Z. Peng, W. Wang, L. Dong, Y . Hao, S. Huang, S. Ma, and F. Wei. Kosmos-2: Grounding multimodal large language models to the world.arXiv preprint arXiv:2306.14824, 2023. 3

  61. [61]

    J. Qi, G. Liu, C. S. Jensen, and L. Kulik. Effectively learning spatial indices.Proc. VLDB Endow., 13(12):2341–2354, 2020. 2

  62. [62]

    L. Qiu, E. S. Kim, S. Suh, L. Sidenmark, and T. Grossman. Margina- lia: Enabling in-person lecture capturing and note-taking through mixed reality. InProc. of ACM CHI, 2025. 2

  63. [63]

    Radford, J

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al. Learning transferable visual models from natural language supervision. InProc. of ICML, pp. 8748–8763, 2021. 3

  64. [64]

    Rajaram, M

    S. Rajaram, M. Peralta, J. G. Johnson, and M. Nebeling. Exploring the design space of privacy-driven adaptation techniques for future augmented reality interfaces. InProc of ACM CHI, pp. 1–19, 2025. 6

  65. [65]

    Rajaram, F

    S. Rajaram, F. Roesner, and M. Nebeling. Reframe: An augmented reality storyboarding tool for character-driven analysis of security & privacy concerns. InProc. of ACM UIST, pp. 1–15, 2023. 6

  66. [66]

    I. H. Sarker and K. Salah. Appspred: Predicting context-aware smart- phone apps using random forest learning.IoT, 8:100106, 2019. 2

  67. [67]

    K. A. Satriadi, A. Cunningham, R. T. Smith, T. Dwyer, A. Dro- gemuller, and B. H. Thomas. Proxsituated visualization: An extended model of situated visualization using proxies for physical referents. In Proc. of ACM CHI, pp. 1–20, 2023. 2

  68. [68]

    K. A. Satriadi, B. Tag, and T. Dwyer. Context-dependent memory in situated visualization.arXiv:2311.12288, 2023. 2

  69. [69]

    Shakeri, H

    M. Shakeri, H. Park, I. Jeon, A. Sadeghi-Niaraki, and W. Woo. User behavior modeling for ar personalized recommendations in spatial transitions.VR, 27(4):3033–3050, 2023. 6

  70. [70]

    J. Shen, J. J. Dudley, and P. O. Kristensson. Encode-store-retrieve: Augmenting human memory through language-encoded egocentric perception. InProc. of IEEE ISMAR, pp. 923–931, 2024. 1, 2

  71. [71]

    L. Shen, H. Li, Y . Wang, and H. Qu. From data to story: Towards automatic animated data video creation with LLM-based multi-agent systems.IEEE TVCG, 2024. 2

  72. [72]

    F. A. Silva, A. C. Domingues, and T. R. B. Silva. Discovering mobile application usage patterns from a large-scale dataset.ACM TKDD, 12(5):1–36, 2018. 2

  73. [73]

    R. Tian, H. Zhai, W. Zhang, F. Wang, and Y . Guan. A survey of spatio- temporal big data indexing methods in distributed environment.IEEE J-STARS, 15:4132–4155, 2022. 2

  74. [74]

    T. T. M. Tran, S. Brown, O. Weidlich, S. Yoo, and C. Parker. Wear- able ar in everyday contexts: Insights from a digital ethnography of youtube videos. InProc. of ACM CHI, 2025. 2

  75. [75]

    Urban and C

    M. Urban and C. Binnig. Caesura: Language models as multi-modal query planners. InProc. of CIDR, 2024. 2

  76. [76]

    Z. Wang, M. Rao, S. Ye, W. Song, and F. Lu. Towards spatial comput- ing: recent advances in multimodal natural interaction for XR head- sets.arXiv:2502.07598, 2025. 2

  77. [77]

    Q. Xie, S. Y . Min, T. Zhang, K. Xu, A. Bajaj, R. Salakhutdi- nov, M. Johnson-Roberson, and Y . Bisk. Embodied-rag: Gen- eral non-parametric embodied memory for retrieval and generation. arXiv:2409.18313, 2024. 6

  78. [78]

    Xreal.https://www.xreal.com, 2025

    XREAL. Xreal.https://www.xreal.com, 2025. Jul. 16. 2025. 6

  79. [79]

    J. Yang, S. Liu, H. Guo, Y . Dong, X. Zhang, S. Zhang, P. Wang, Z. Zhou, B. Xie, Z. Wang, B. Ouyang, Z. Lin, M. Cominelli, Z. Cai, Y . Zhang, P. Zhang, F. Hong, J. Widmer, F. Gringoli, L. Yang, B. Li, and Z. Liu. Egolife: Towards egocentric life assistant. InProc. of IEEE/CVF CVPR, 2025. 1, 3, 6

  80. [80]

    W. D. Zulfikar, S. Chan, and P. Maes. Memoro: Using large language models to realize a concise interface for real-time memory augmenta- tion. InProc. of ACM CHI, pp. 1–18, 2024. 1, 2, 3 8