arxiv: 2601.17622 · v4 · submitted 2026-01-24 · 💻 cs.HC · cs.CL· cs.IR

Recognition: 2 theorem links

· Lean Theorem

Memento: Towards Proactive Visualization of Everyday Memories with Personal Wearable AR Assistant

Yoonsang Kim , Yalong Yang , Arie E. Kaufman

Authors on Pith no claims yet

Pith reviewed 2026-05-16 10:46 UTC · model grok-4.3

classification 💻 cs.HC cs.CLcs.IR

keywords augmented realityproactive assistantcontext-aware computingwearable ARmemory visualizationconversational interfaceeveryday memories

0 comments

The pith

Memento stores spoken queries with their time, place and activity contexts so an AR assistant can proactively recall and visualize them during matching daily situations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Memento, a conversational wearable AR system that permanently records each user query together with the surrounding spatiotemporal and activity details. These stored memories allow the system to identify recurring personal interests and the specific contexts that trigger them. When similar contexts reappear, Memento automatically surfaces up-to-date, context-relevant responses through AR overlays, turning isolated interactions into a continuous, long-term memory aid. A small user study collected feedback from participants with varying AR experience to surface practical challenges in keeping such assistance helpful rather than intrusive.

Core claim

Memento permanently captures and memorizes user's verbal queries alongside their spatiotemporal and activity contexts. By storing these memories, it discovers connections between recurring interests and the contexts that trigger them. Upon detection of similar contexts, Memento proactively recalls user interests and delivers up-to-date responses through AR, creating connected long-term interactions tailored to the user's multimodal context instead of treating each query as a one-off event.

What carries the argument

Persistent memory storage that links each verbal query to its spatiotemporal and activity context, enabling context-triggered proactive AR recall.

If this is right

AR assistance shifts from reactive commands to persistent, context-driven support across days or weeks.
Daily routines gain seamless integration of personalized information without explicit user requests.
Design trade-offs emerge around accuracy, timing, and intrusiveness of proactive visualizations.
Long-term memory augmentation becomes possible by accumulating and reusing personal query history.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same memory-linking approach could extend to non-verbal cues such as eye gaze or body posture to trigger recall.
Privacy and data-retention rules would need explicit handling because the system continuously logs personal contexts.
Over time, the accumulated memories might support trend analysis or personal knowledge graphs beyond immediate AR delivery.

Load-bearing premise

Recurring user interests are reliably triggered by detectable spatiotemporal and activity contexts, and proactive AR recall will feel helpful rather than intrusive or inaccurate.

What would settle it

A field deployment in which users report that Memento frequently surfaces irrelevant or outdated information during routine activities, or that the proactive overlays interrupt their tasks more than they help.

Figures

Figures reproduced from arXiv: 2601.17622 by Arie E. Kaufman, Yalong Yang, Yoonsang Kim.

**Figure 1.** Figure 1: Concept illustration of Memento. The user’s prior queries are stored in the form of “memories”, along with their referent, [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: The view of situated AR in Memento. A physical refer [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Memento pipeline overview: With a verbal query, the visual, head-gaze, and the transcribed textual query from the user are sent to [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: User interfaces of Memento. (A) Example visualization of [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: The three real-world use-cases of Memento. Upon the contextual alignment of the user’s current spatial activity (Referent, Space, [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

read the original abstract

We introduce Memento, a conversational AR assistant that permanently captures and memorizes user's verbal queries alongside their spatiotemporal and activity contexts. By storing these "memories," Memento discovers connections between users' recurring interests and the contexts that trigger them. Upon detection of similar or identical spatiotemporal activity, Memento proactively recalls user interests and delivers up-to-date responses through AR, seamlessly integrating AR experience into their daily routine. Unlike prior work, each interaction in Memento is not a transient event, but a connected series of interactions with coherent long--term perspective, tailored to the user's broader multimodal (visual, spatial, temporal, and embodied) context. We conduct a preliminary evaluation through user feedbacks with participants of diverse expertise in immersive apps, and explore the value of proactive context-aware AR assistant in everyday settings. We share our findings and challenges in designing a proactive, context-aware AR system.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Memento sketches a persistent AR memory system but the preliminary feedback gives no numbers on whether the proactive recalls actually help or just intrude.

read the letter

Memento stores user queries in AR along with their spatiotemporal and activity contexts, then tries to spot patterns and surface relevant memories proactively when similar situations arise. The pitch is that this turns isolated interactions into a connected, long-term memory aid instead of one-off sessions. The architecture description is clear enough: log the query plus context, discover recurring links, and push an updated response through the AR display when the context matches again. They ran some initial feedback sessions with people who already know immersive apps, which at least brings in practical input on daily use. That part is useful for anyone thinking about wearable assistants. The paper also flags real design issues like avoiding intrusive false triggers. The soft spot is the evaluation. It is labeled preliminary, yet the text gives no participant count, no task details, no accuracy figures for context matching, and no sense of how users felt about the recalls over time. Without those, the claim that the system delivers helpful long-term coherence stays untested. The assumption that detectable contexts will reliably surface useful interests without noise is left as an open question rather than measured. This is a systems proposal aimed at HCI researchers working on AR interfaces and personal memory tools. Someone building similar context-aware assistants could borrow the storage and triggering approach, but the thin evidence means it is not yet something to cite as proof the idea works in practice. I would send it to peer review if the authors add a structured study with concrete metrics on detection precision and user perception; right now it reads more like an early demo than a substantiated result.

Referee Report

3 major / 2 minor

Summary. The paper introduces Memento, a conversational wearable AR assistant that permanently stores user verbal queries together with their spatiotemporal and activity contexts. It claims to discover connections between recurring interests and triggering contexts, then proactively recall and deliver updated responses via AR upon detecting matching contexts. This is positioned as creating coherent long-term interaction series (unlike prior transient systems) that are tailored to the user's multimodal context; support comes from a preliminary user feedback study with participants of diverse immersive-app expertise.

Significance. If the context-matching and proactive-recall components can be shown to operate with acceptable precision and without excessive intrusiveness, the work would offer a concrete step toward persistent, memory-augmented AR assistants that integrate into everyday routines. The emphasis on multimodal (visual, spatial, temporal, embodied) context and the explicit contrast with transient prior systems are potentially valuable contributions, though the current preliminary feedback provides only weak grounding for these advantages.

major comments (3)

[Evaluation] Evaluation section: the preliminary user feedback study is described only at a high level; no participant count, study duration, data-collection protocol, or quantitative metrics (e.g., context-matching precision, recall accuracy, or Likert-scale ratings of helpfulness vs. intrusiveness) are reported. Without these, the support for the central claim of discovering connections and delivering proactive value remains unverifiable.
[Abstract / Introduction] Abstract and §1: the assertion that 'each interaction in Memento is not a transient event, but a connected series of interactions with coherent long-term perspective' is presented as a distinguishing feature, yet no longitudinal data, false-positive rates for context matching, or evidence of sustained user perception of coherence are supplied.
[System Overview] System description: the assumption that recurring interests are reliably triggered by detectable spatiotemporal/activity contexts (and that proactive AR recall will be experienced as helpful rather than intrusive) is stated without any reported implementation details or validation of the matching algorithm's precision over time.

minor comments (2)

[Abstract] Abstract contains the typographical string 'long--term' (double dash); standardize to 'long-term'.
[Figures] Figure captions and system diagrams would benefit from explicit labels indicating which components handle context storage versus proactive retrieval.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive feedback on our preliminary work. We agree that the evaluation description is high-level and that some claims require qualification given the early stage of the research. We will revise the manuscript to address these points and provide additional details where available from the study.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the preliminary user feedback study is described only at a high level; no participant count, study duration, data-collection protocol, or quantitative metrics (e.g., context-matching precision, recall accuracy, or Likert-scale ratings of helpfulness vs. intrusiveness) are reported. Without these, the support for the central claim of discovering connections and delivering proactive value remains unverifiable.

Authors: We agree that the Evaluation section requires expansion for verifiability. In the revision we will add the participant count, study duration, data-collection protocol, and available feedback metrics including Likert-scale ratings on helpfulness and intrusiveness. We will also explicitly note that quantitative precision/recall metrics for context matching were not computed in this exploratory study and remain future work. revision: yes
Referee: [Abstract / Introduction] Abstract and §1: the assertion that 'each interaction in Memento is not a transient event, but a connected series of interactions with coherent long-term perspective' is presented as a distinguishing feature, yet no longitudinal data, false-positive rates for context matching, or evidence of sustained user perception of coherence are supplied.

Authors: We acknowledge the claim is aspirational. The preliminary feedback indicated positive user perception of connected interactions, but we lack longitudinal data. We will revise the Abstract and §1 to frame the statement as the system's design intent and initial user reactions, while adding an explicit limitations paragraph on the absence of longitudinal evidence and false-positive rates. revision: partial
Referee: [System Overview] System description: the assumption that recurring interests are reliably triggered by detectable spatiotemporal/activity contexts (and that proactive AR recall will be experienced as helpful rather than intrusive) is stated without any reported implementation details or validation of the matching algorithm's precision over time.

Authors: We will expand the System Overview with additional implementation specifics on context encoding, storage, and matching logic. We will also report user-study observations on perceived helpfulness versus intrusiveness of proactive recalls to better ground the assumptions, while noting that long-term precision validation is planned for future deployments. revision: yes

standing simulated objections not resolved

Absence of longitudinal data and quantitative precision metrics for context matching, which would require extended real-world deployment studies beyond the scope of this preliminary evaluation.

Circularity Check

0 steps flagged

No circularity: system proposal without derivations or self-referential reductions

full rationale

The paper introduces a wearable AR system concept for capturing verbal queries with spatiotemporal/activity contexts and proactively recalling them on matching contexts. It describes the architecture, memory storage, and proactive recall mechanism, then reports preliminary qualitative user feedback. No equations, fitted parameters, predictions, or uniqueness theorems appear. The central claim that interactions form a 'connected series with coherent long-term perspective' is presented as a direct consequence of the described storage-and-recall design rather than derived from any prior result or self-citation chain. The distinction from transient prior work is asserted by construction of the new system, not by circular reduction. This is a standard honest non-finding for a design/proposal paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The proposal rests on domain assumptions about human behavior and context detection rather than new mathematical constructs or invented physical entities.

axioms (2)

domain assumption Users exhibit recurring interests that are triggered by repeatable spatiotemporal and activity contexts
Invoked to justify the value of proactive recall; appears in the description of memory discovery and triggering.
domain assumption Context detection from wearable sensors is sufficiently accurate to identify similar situations reliably
Required for the proactive mechanism to function without excessive false positives or misses.

pith-pipeline@v0.9.0 · 5455 in / 1346 out tokens · 41301 ms · 2026-05-16T10:46:40.097263+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

stores queries with spatiotemporal and activity contexts... cosine similarity... R-tree + HNSW... proactive recall on similar contexts
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Referent-anchored Spatiotemporal Activity Memory (RSAM)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

VisionClaw: Always-On AI Agents through Smart Glasses
cs.HC 2026-04 unverdicted novelty 5.0

VisionClaw couples continuous egocentric vision on smart glasses with speech-driven AI agents to enable hands-free real-world tasks, with lab and field studies showing faster completion and a shift toward opportunisti...
Exploring Experiential Differences Between Virtual and Physical Memory-Linked Objects in Extended Reality
cs.HC 2026-03 unverdicted novelty 4.0

A study of 24 users finds that physical and virtual memory-linked objects in XR support stronger social connection and engagement than conventional gallery interfaces for sharing personal memories.

Reference graph

Works this paper leans on

80 extracted references · 80 canonical work pages · cited by 2 Pith papers · 2 internal anchors

[1]

M. B. Almourad, M. Hussein, E. Bataineh, and Z. Wattar. Explor- ing smartphone usage dynamics: Unveiling app-specific patterns and trends.IJWI, 22(2), 2024. 2

work page 2024
[2]

Andolina, V

S. Andolina, V . Orso, H. Schneider, K. Klouche, T. Ruotsalo, L. Gam- berini, and G. Jacucci. Searchbot: Supporting voice conversations with proactive search. InProc. of ACM CSCW, pp. 9–12, 2018. 3

work page 2018
[3]

Arakawa, J

R. Arakawa, J. F. Lehman, and M. Goel. Prism-q&a: Step-aware voice assistant on a smartwatch enabled by multimodal procedure tracking and large language models. InProc. ACM IMWUT, vol. 8, pp. 1–26,

work page
[4]

A. Asai, S. Min, Z. Zhong, and D. Chen. Retrieval-based language models and applications. InProc. of ACL, pp. 41–46, 2023. 2

work page 2023
[5]

R. T. Azuma. The road to ubiquitous consumer augmented reality systems.HBET, 1(1):26–32, 2019. 2, 3, 6

work page 2019
[6]

Barfield.Fundamentals of wearable computers and augmented reality

W. Barfield.Fundamentals of wearable computers and augmented reality. CRC Press, 2015. 2

work page 2015
[7]

Bressa, J

N. Bressa, J. Vermeulen, and W. Willett. Data every day: Designing and living with personal situated visualizations. InProc. of ACM CHI, pp. 1–18, 2022. 2, 6

work page 2022
[8]

R. Cai, N. Janaka, H. Kim, Y . Chen, S. Zhao, Y . Huang, and D. Hsu. Aiget: Transforming everyday moments into hidden knowledge dis- covery with ai assistance on smart glasses.arXiv:2501.16240, 2025. 6

work page arXiv 2025
[9]

Chang, Y

R.-C. Chang, Y . Liu, and A. Guo. Worldscribe: Towards context- aware live visual descriptions. InProc. of ACM UIST, pp. 1–18, 2024. 1, 2

work page 2024
[10]

W. Chen, H. Hu, X. Chen, P. Verga, and W. Cohen. Murag: Mul- timodal retrieval-augmented generator for open question answering over images and text. InProc. of EMNLP, pp. 5558–5570, 2022. 2

work page 2022
[11]

Cheng, L

T. Cheng, L. Song, Y . Ge, W. Liu, X. Wang, and Y . Shan. Yolo-world: Real-time open-vocabulary object detection. InProc. of IEEE/CVF CVPR, 2024. 3

work page 2024
[12]

K. Choe, C. Lee, S. Lee, J. Song, A. Cho, N. W. Kim, and J. Seo. Enhancing data literacy on-demand: LLMs as guides for novices in chart interpretation.IEEE TVCG, 2024. 2

work page 2024
[13]

W. Cui, X. Zhang, Y . Wang, H. Huang, B. Chen, L. Fang, H. Zhang, J. Lou, and D. Zhang. Text-to-Viz: automatic generation of infograph- ics from proportion-related natural language statements.IEEE TVCG, 26(1):906–916, 2020. 2

work page 2020
[14]

De La Torre, C

F. De La Torre, C. M. Fang, H. Huang, A. Banburski-Fahey, J. Amores Fernandez, and J. Lanier. Llmr: Real-time prompting of interactive worlds using large language models. InProc. of CHI, pp. 1–22, 2024. 2

work page 2024
[15]

T. Deng, S. Kanthawala, J. Meng, W. Peng, A. Kononova, Q. Hao, Q. Zhang, and P. David. Measuring smartphone usage and task switch- ing with log tracking and self-reports.MMC, 7(1):3–23, 2019. 2

work page 2019
[16]

M. D. Dogan, E. J. Gonzalez, K. Ahuja, R. Du, A. Colac ¸o, J. Lee, M. Gonzalez-Franco, and D. Kim. Augmented object intelligence with XR-Objects. InProc. of ACM UIST, pp. 1–15, 2024. 1, 2

work page 2024
[17]

D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Chao, A. Mody, S. Truitt, and J. Larson. From local to global: A graph rag approach to query- focused summarization.arXiv:2404.16130, 2024. 6

work page internal anchor Pith review Pith/arXiv arXiv 2024
[18]

Evgrashin

A. Evgrashin. Whisper for unity.https://github.com/Macoron/ whisper.unity/tree/master, 2024. Aug. 31. 2024. 3

work page 2024
[19]

C. M. Fang, Y . Samaradivakara, P. Maes, and S. Nanayakkara. Mi- rai: A wearable proactive ai” inner-voice” for contextual nudging. arXiv:2502.02370, 2025. 1, 2

work page arXiv 2025
[20]

GoogleAI

Google. GoogleAI. Gemini models.https://ai.google.dev/ gemini-api/docs/models/. Mar. 21. 2025]. 3

work page 2025
[21]

Programmable search engine.https://developers

Google. Programmable search engine.https://developers. google.com/custom-search/v1/overview, 2025. Mar. 23. 2025. 3

work page 2025
[22]

Grubert, T

J. Grubert, T. Langlotz, S. Zollmann, and H. Regenbrecht. Towards pervasive augmented reality: Context-awareness in augmented reality. IEEE TVCG, 23(6):1706–1724, 2016. 3

work page 2016
[23]

A. Guttman. R-trees: a dynamic index structure for spatial searching. ACM SIGMOD, 14(2):47–57, 1984. 4

work page 1984
[24]

Han and K

C. Han and K. E. Isaacs. A deixis-centered approach for documenting remote synchronous communication around data visualizations.IEEE TVCG, 31(1):930–940, 2024. 2

work page 2024
[25]

Harvey, M

M. Harvey, M. Langheinrich, and G. Ward. Remembering through lifelogging: A survey of human memory augmentation.PMCJ, 27:14–26, 2016. 2

work page 2016
[26]

Huang, J

Y . Huang, J. Xu, B. Pei, Y . He, G. Chen, L. Yang, X. Chen, Y . Wang, Z. Nie, J. Liu, G. Fan, D. Lin, F. Fang, K. Li, C. Yuan, Y . Wang, Y . Qiao, and L. Wang. Vinci: A real-time embodied smart assistant based on egocentric vision-language model.arXiv:2412.21080, 2024. 1, 2

work page arXiv 2024
[27]

What is IFTTT.https://ifttt.com/explore/new_to_ ifttt, 2025

IFTTT. What is IFTTT.https://ifttt.com/explore/new_to_ ifttt, 2025. Apr. 4. 2025. 6

work page 2025
[28]

Kashmira, J

S. Kashmira, J. L. Dantanarayana, J. Brodsky, A. Mahendra, Y . Kang, K. Flautner, L. Tang, and J. Mars. A graph-based approach for con- versational ai-driven personal memory capture and retrieval in a real- world application.arXiv:2412.05447, 2024. 2

work page arXiv 2024
[29]

M. S. U. Khan, M. Z. Afzal, and D. Stricker. SituationalLLM: proac- tive language models with scene awareness for dynamic, contextual task guidance.arXiv:2406.13302, 2024. 2

work page arXiv 2024
[30]

O. Khan, Z. Ahmed, H. Nam, and K. Kim. Tangiblemoments: Em- bedding XR memories onto physical objects. InProc. of IEEE VRW, pp. 1147–1153, 2025. 2

work page 2025
[31]

D. Kim, J. Gluck, M. Hall, and Y . Agarwal. Real world longitudinal ios app usage study at scale.arXiv:1912.12526, 2019. 2

work page arXiv 1912
[32]

Y . Kim, Z. Aamir, M. Singh, S. Boorboor, K. Mueller, and A. E. Kaufman. Explainable XR: understanding user behaviors of XR en- vironments using LLM-assisted analytics framework.IEEE TVCG, 31(5):1–11, 2025. 2

work page 2025
[33]

Y . Kim, S. Boorboor, A. Rahmati, and A. Kaufman. Design of privacy preservation system in augmented reality. InProc. of IEEE VizSec,

work page
[34]

Y . Kim, S. Goutam, A. Rahmati, and A. Kaufman. Erebus: Access control for augmented reality systems. InProc. of USENIX Security, pp. 929–946, 2023. 6

work page 2023
[35]

Koutroumanis and C

N. Koutroumanis and C. Doulkeridis. Scalable spatio-temporal index- ing and querying over a document-oriented nosql store. InProc. of EDBT, pp. 611–622, 2021. 2

work page 2021
[36]

B. Lee, M. Sedlmair, and D. Schmalstieg. Design patterns for situated visualization in augmented reality.IEEE TVCG, 30(1):1324–1335,

work page
[37]

G. Lee, M. Xia, N. Numan, X. Qian, D. Li, Y . Chen, A. Kulshrestha, I. Chatterjee, Y . Zhang, D. Manocha, et al. Sensible agent: A frame- work for unobtrusive interaction with proactive ar agents. InProc. of ACM UIST, pp. 1–22, 2025. 2

work page 2025
[38]

J. Lee, J. Wang, E. Brown, L. Chu, S. S. Rodriguez, and J. E. Froehlich. GazePointAR: a context-aware multimodal voice assistant for pronoun disambiguation in wearable augmented reality. InProc. of ACM CHI, pp. 1–20, 2024. 1, 2, 3

work page 2024
[39]

J. Lee, T. Wang, J. Fashimpaur, N. Sendhilnathan, and T. R. Jonker. Walkie-talkie: Exploring longitudinal natural gaze, llms, and vlms for query disambiguation in xr. InProc. of ACM CHI EA, pp. 1–9, 2025. 6

work page 2025
[40]

Lewis, E

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt ¨aschel, et al. Retrieval- augmented generation for knowledge-intensive nlp tasks.NeurIPS, 33:9459–9474, 2020. 2

work page 2020
[41]

C. Li, G. Wu, G. Y .-Y . Chan, D. G. Turakhia, S. Castelo Quispe, D. Li, L. Welch, C. Silva, and J. Qian. Satori: Towards proactive ar assistant with belief-desire-intention user modeling. InProc. of ACM CHI, pp. 1–24, 2025. 2

work page 2025
[42]

J. N. Li, Y . Xu, T. Grossman, S. Santosa, and M. Li. OmniActions: predicting digital actions in response to real-world multimodal sen- sory inputs with LLMs. InProc. of ACM CHI, pp. 1–22, 2024. 1, 2 7

work page 2024
[43]

J. N. Li, Z. J. Zhang, and J. Ma. Omniquery: Contextually augmenting captured multimodal memory to enable personal question answering. InProc. of ACM CHI, 2025. 1

work page 2025
[44]

J. Liu, K. A. Satriadi, B. Ens, and T. Dwyer. Investigating the effects of physical landmarks on spatial memory for information visualisation in augmented reality. InProc. of IEEE ISMAR, pp. 289–298, 2024. 2

work page 2024
[45]

S. Liu, J. Xu, W. Tjangnaka, S. Semnani, C. Yu, and M. Lam. Suql: Conversational search over structured and unstructured data with large language models. InProc. of NAACL, pp. 4535–4555, 2024. 2

work page 2024
[46]

S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, Q. Jiang, C. Li, J. Yang, H. Su, et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. InProc. of ECCV, pp. 38– 55, 2024. 3

work page 2024
[47]

X. B. Liu, S. Fang, W. Shi, C.-S. Wu, T. Igarashi, and X. Chen. Proac- tive conversational agents with inner thoughts. InProc. of ACM CHI,

work page
[48]

X. B. Liu, V . Kirilyuk, X. Yuan, A. Olwal, P. Chi, X. A. Chen, and R. Du. Visual captions: augmenting verbal communication with on- the-fly visuals. InProc. of ACM CHI, pp. 1–20, 2023. 3, 4

work page 2023
[49]

Lu and D

F. Lu and D. A. Bowman. Evaluating the potential of glanceable ar in- terfaces for authentic everyday uses. InIEEE VR, pp. 768–777, 2021. 6

work page 2021
[50]

F. Lu, L. Pavanatto, and D. A. Bowman. In-the-wild experiences with an interactive glanceable ar system for everyday use. InProc. of ACM SUI, pp. 1–9, 2023. 6

work page 2023
[51]

Z. Lv, N. Charron, P. Moulon, A. Gamino, C. Peng, C. Sweeney, E. Miller, H. Tang, J. Meissner, J. Dong, et al. Aria everyday activities dataset.arXiv:2402.13349, 2024. 2

work page arXiv 2024
[52]

Y . A. Malkov and D. A. Yashunin. Efficient and robust approxi- mate nearest neighbor search using hierarchical navigable small world graphs.IEEE TPAMI, 42(4):824–836, 2018. 4

work page 2018
[53]

R. E. Mayer. Multimedia learning. InPsychology of learning and motivation, vol. 41, pp. 85–139. Elsevier, 2002. 6

work page 2002
[54]

Meurisch, C

C. Meurisch, C. A. Mihale-Wilson, A. Hawlitschek, F. Giger, F. M¨uller, O. Hinz, and M. M ¨uhlh¨auser. Exploring user expectations of proactive ai systems.Proc. of ACM IMWUT, 4(4):1–22, 2020. 3, 4

work page 2020
[55]

Head-gaze and dwell.https://learn

Microsoft. Head-gaze and dwell.https://learn. microsoft.com/en-us/windows/mixed-reality/design/ gaze-and-dwell-head. Mar. 23. 2025. 4

work page 2025
[56]

Morris, V

C. Morris, V . Danry, and P. Maes. Wearable systems without expe- riential disruptions: exploring the impact of device feedback changes on explicit awareness, physiological synchrony, sense of agency, and device-body ownership.FCS, 5, 2023. 3

work page 2023
[57]

L. Ning, L. Liu, J. Wu, N. Wu, D. Berlowitz, S. Prakash, B. Green, S. O’Banion, and J. Xie. User-llm: Efficient llm contextualization with user embeddings. InProc. of ACM WWW, pp. 1219–1223, 2025. 6

work page 2025
[58]

OpenAI. OpenAI. Gpt-4v(ision) system card.https://cdn. openai.com/papers/GPTV_System_Card.pdf. Mar. 28. 2024]. 3

work page 2024
[59]

P ¨atzold, J

B. P ¨atzold, J. Nogga, and S. Behnke. Leveraging vision-language models for open-vocabulary instance segmentation and tracking. arXiv:2503.16538, 2025. 5

work page arXiv 2025
[60]

Z. Peng, W. Wang, L. Dong, Y . Hao, S. Huang, S. Ma, and F. Wei. Kosmos-2: Grounding multimodal large language models to the world.arXiv preprint arXiv:2306.14824, 2023. 3

work page internal anchor Pith review Pith/arXiv arXiv 2023
[61]

J. Qi, G. Liu, C. S. Jensen, and L. Kulik. Effectively learning spatial indices.Proc. VLDB Endow., 13(12):2341–2354, 2020. 2

work page 2020
[62]

L. Qiu, E. S. Kim, S. Suh, L. Sidenmark, and T. Grossman. Margina- lia: Enabling in-person lecture capturing and note-taking through mixed reality. InProc. of ACM CHI, 2025. 2

work page 2025
[63]

Radford, J

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al. Learning transferable visual models from natural language supervision. InProc. of ICML, pp. 8748–8763, 2021. 3

work page 2021
[64]

Rajaram, M

S. Rajaram, M. Peralta, J. G. Johnson, and M. Nebeling. Exploring the design space of privacy-driven adaptation techniques for future augmented reality interfaces. InProc of ACM CHI, pp. 1–19, 2025. 6

work page 2025
[65]

Rajaram, F

S. Rajaram, F. Roesner, and M. Nebeling. Reframe: An augmented reality storyboarding tool for character-driven analysis of security & privacy concerns. InProc. of ACM UIST, pp. 1–15, 2023. 6

work page 2023
[66]

I. H. Sarker and K. Salah. Appspred: Predicting context-aware smart- phone apps using random forest learning.IoT, 8:100106, 2019. 2

work page 2019
[67]

K. A. Satriadi, A. Cunningham, R. T. Smith, T. Dwyer, A. Dro- gemuller, and B. H. Thomas. Proxsituated visualization: An extended model of situated visualization using proxies for physical referents. In Proc. of ACM CHI, pp. 1–20, 2023. 2

work page 2023
[68]

K. A. Satriadi, B. Tag, and T. Dwyer. Context-dependent memory in situated visualization.arXiv:2311.12288, 2023. 2

work page arXiv 2023
[69]

Shakeri, H

M. Shakeri, H. Park, I. Jeon, A. Sadeghi-Niaraki, and W. Woo. User behavior modeling for ar personalized recommendations in spatial transitions.VR, 27(4):3033–3050, 2023. 6

work page 2023
[70]

J. Shen, J. J. Dudley, and P. O. Kristensson. Encode-store-retrieve: Augmenting human memory through language-encoded egocentric perception. InProc. of IEEE ISMAR, pp. 923–931, 2024. 1, 2

work page 2024
[71]

L. Shen, H. Li, Y . Wang, and H. Qu. From data to story: Towards automatic animated data video creation with LLM-based multi-agent systems.IEEE TVCG, 2024. 2

work page 2024
[72]

F. A. Silva, A. C. Domingues, and T. R. B. Silva. Discovering mobile application usage patterns from a large-scale dataset.ACM TKDD, 12(5):1–36, 2018. 2

work page 2018
[73]

R. Tian, H. Zhai, W. Zhang, F. Wang, and Y . Guan. A survey of spatio- temporal big data indexing methods in distributed environment.IEEE J-STARS, 15:4132–4155, 2022. 2

work page 2022
[74]

T. T. M. Tran, S. Brown, O. Weidlich, S. Yoo, and C. Parker. Wear- able ar in everyday contexts: Insights from a digital ethnography of youtube videos. InProc. of ACM CHI, 2025. 2

work page 2025
[75]

Urban and C

M. Urban and C. Binnig. Caesura: Language models as multi-modal query planners. InProc. of CIDR, 2024. 2

work page 2024
[76]

Z. Wang, M. Rao, S. Ye, W. Song, and F. Lu. Towards spatial comput- ing: recent advances in multimodal natural interaction for XR head- sets.arXiv:2502.07598, 2025. 2

work page arXiv 2025
[77]

Q. Xie, S. Y . Min, T. Zhang, K. Xu, A. Bajaj, R. Salakhutdi- nov, M. Johnson-Roberson, and Y . Bisk. Embodied-rag: Gen- eral non-parametric embodied memory for retrieval and generation. arXiv:2409.18313, 2024. 6

work page arXiv 2024
[78]

Xreal.https://www.xreal.com, 2025

XREAL. Xreal.https://www.xreal.com, 2025. Jul. 16. 2025. 6

work page 2025
[79]

J. Yang, S. Liu, H. Guo, Y . Dong, X. Zhang, S. Zhang, P. Wang, Z. Zhou, B. Xie, Z. Wang, B. Ouyang, Z. Lin, M. Cominelli, Z. Cai, Y . Zhang, P. Zhang, F. Hong, J. Widmer, F. Gringoli, L. Yang, B. Li, and Z. Liu. Egolife: Towards egocentric life assistant. InProc. of IEEE/CVF CVPR, 2025. 1, 3, 6

work page 2025
[80]

W. D. Zulfikar, S. Chan, and P. Maes. Memoro: Using large language models to realize a concise interface for real-time memory augmenta- tion. InProc. of ACM CHI, pp. 1–18, 2024. 1, 2, 3 8

work page 2024