pith. machine review for the scientific record. sign in

arxiv: 2604.09438 · v1 · submitted 2026-04-10 · 💻 cs.HC

Recognition: unknown

Intent Lenses: Inferring Capture-Time Intent to Transform Opportunistic Photo Captures into Structured Visual Notes

Aeneas Leon Sommer, Ashwin Ram, J\"urgen Steimle, Martin Schmitz

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:04 UTC · model grok-4.3

classification 💻 cs.HC
keywords opportunistic photo captureintent inferencevisual noteslarge language modelssensemakinginteractive systemsconference notesstructured notes
0
0 comments X

The pith

Inferring capture-time intent from photos lets large language models create structured visual notes that reflect what users meant to capture.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Intent Lenses as a conceptual primitive to turn opportunistic photos into meaningful notes. It uses large language models to infer the user's intent at capture time and generates reusable interactive objects that specify what function to apply, which parts of the photo to focus on, and how to represent the results. This is demonstrated in an interactive system for academic conference photos where lenses are placed on a spatial canvas for sensemaking. A study with nine academics indicated that these intent-mediated notes matched expectations and supported both overviews and deeper exploration. Such an approach addresses the common problem of photo collections remaining unstructured and underused.

Core claim

Intent Lenses reify users' capture-time intent inferred from captured information into reusable interactive objects that encode the function to perform, the information sources to focus on, and how results are represented at an appropriate level of detail. These lenses are dynamically generated using the reasoning capabilities of large language models and, when applied to conference presentation captures, produce structured visual notes on a spatial canvas that users can further manipulate.

What carries the argument

Intent Lenses, reusable interactive objects that encode the function to perform, the information sources to focus on, and representation detail level, generated dynamically by large language models from photo visual content.

If this is right

  • Intent-mediated notes align with users' expectations for what they intended to capture.
  • The notes provide effective overviews of captures while facilitating deeper sensemaking.
  • Users can add, link, and arrange lenses across captures to support exploration.
  • The system transforms generic photo collections into personalized structured notes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar lenses could be applied in non-academic settings such as capturing product information during shopping or artifact details at museums.
  • Over time, patterns in inferred intents might inform better default lenses or user-specific models.
  • Combining lenses with other data sources like timestamps or location could further refine the inference process.

Load-bearing premise

That large language models can accurately and consistently infer users' specific capture-time intent solely from the visual content of opportunistic photos without additional context or user input.

What would settle it

A study in which participants compare alignment of generated notes with their own recalled intent shows no advantage for intent-inferred lenses over generic summaries, or shows no gain in sensemaking ratings.

Figures

Figures reproduced from arXiv: 2604.09438 by Aeneas Leon Sommer, Ashwin Ram, J\"urgen Steimle, Martin Schmitz.

Figure 1
Figure 1. Figure 1: Intent Lenses are reusable interactive representations of users’ capture-time intent, inferred from opportunistic photo captures in information-rich settings such as academic conferences. These lenses automatically transform captured photos into structured visual notes that align with what users intended to capture. Users can further treat inferred lenses as reusable objects that can be reapplied to presen… view at source ↗
Figure 2
Figure 2. Figure 2: Intent Lens construction pipeline. It consists of four LLM-based stages: intent inference, intent clustering, intent [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Graphical interface for exploring and making sense of opportunistic conference photo captures using [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overview of descriptive statistics for participants’ provided photo captures and interaction behavior with [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
read the original abstract

Opportunistic photo capture (e.g., slides, exhibits, or artifacts) is a common strategy for preserving information encountered in information-rich environments for later revisitation. While fast and minimally disruptive, such photo collections rarely become meaningful notes. Existing automatic note-generation approaches provide some support but often produce generic summaries that fail to reflect what users intended to capture. We introduce Intent Lenses, a conceptual primitive for intent-mediated note generation and sensemaking. Intent Lenses reify users' capture-time intent inferred from captured information into reusable interactive objects that encode the function to perform, the information sources to focus on, and how results are represented at an appropriate level of detail. These lenses are dynamically generated using the reasoning capabilities of large language models. To investigate this concept, we instantiate Intent Lenses in the context of academic conference photos and present an interactive system that infers lenses from presentation captures to generate structured visual notes on a spatial canvas. Users can further add, link, and arrange lenses across captures to support exploration and sensemaking. A study with nine academics showed that intent-mediated notes aligned with users' expectations, providing effective overviews of their captures while facilitating deeper sensemaking.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces Intent Lenses as a conceptual primitive for intent-mediated note generation: LLMs infer capture-time intent from opportunistic photos (e.g., conference slides) to produce reusable interactive objects that specify functions, information sources, and representation detail. These are instantiated in an interactive system for academic photos on a spatial canvas where users can add, link, and arrange lenses. A qualitative study with nine academics reports that the resulting notes aligned with user expectations, provided effective overviews, and supported deeper sensemaking.

Significance. If the core inference mechanism can be shown to reliably recover user intent, the work offers a promising HCI primitive for turning ad-hoc photo collections into personalized, explorable notes. The reification of inferred intent into dynamic, composable lenses on a canvas is a concrete design contribution that extends beyond generic summarization and could influence future sensemaking tools that integrate LLM reasoning with direct manipulation.

major comments (1)
  1. Evaluation section: The n=9 qualitative study reports post-use subjective alignment with expectations but contains no pre-capture user statements of intent, no quantitative accuracy/precision metrics of the LLM inference against those statements, and no baseline comparison (e.g., generic summarization without intent lenses). This leaves the central claim—that the notes succeed because intent was accurately inferred from visual content alone—untested; observed benefits could arise from the structured canvas format regardless of inference quality.
minor comments (2)
  1. The manuscript should provide concrete examples of LLM prompts, the exact model and parameters used, and any observed inference failures or edge cases to support reproducibility and allow readers to assess the reliability of the dynamic lens generation.
  2. Figure captions and system description would benefit from clearer distinction between automatically generated lenses and user-added or edited elements to help readers understand the boundary between inference and manual sensemaking.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on the evaluation. We address the major comment below and will revise the manuscript to strengthen the discussion of study limitations and design rationale while preserving the qualitative focus appropriate to this early-stage conceptual contribution.

read point-by-point responses
  1. Referee: Evaluation section: The n=9 qualitative study reports post-use subjective alignment with expectations but contains no pre-capture user statements of intent, no quantitative accuracy/precision metrics of the LLM inference against those statements, and no baseline comparison (e.g., generic summarization without intent lenses). This leaves the central claim—that the notes succeed because intent was accurately inferred from visual content alone—untested; observed benefits could arise from the structured canvas format regardless of inference quality.

    Authors: We agree that a quantitative evaluation of inference accuracy against explicit pre-capture intent would provide stronger evidence. However, the opportunistic nature of photo capture means users rarely articulate precise intent before taking a photo; the study instead measured whether the resulting notes aligned with participants' post-review expectations, which serves as a proxy for intent fidelity in this context. We will revise the Evaluation and Discussion sections to explicitly acknowledge this limitation, clarify that benefits may partly derive from the spatial canvas, and include a qualitative comparison of lens-generated notes versus generic LLM summaries based on participant comments. A controlled quantitative baseline study lies beyond the scope of the current exploratory work but is noted as future research. revision: partial

Circularity Check

0 steps flagged

No circularity; claims rest on independent user study and external LLM capabilities

full rationale

The paper presents a conceptual system (Intent Lenses) that uses LLMs to infer capture-time intent from photos and generates structured notes, evaluated via a separate n=9 user study. No equations, parameters, or derivations exist that could reduce outputs to inputs by construction. Central claims are supported by the external reasoning of LLMs and post-study subjective feedback rather than any self-referential fit or definition. No load-bearing self-citations or ansatzes are present. This is a standard non-circular HCI systems paper with empirical grounding outside its own definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The paper introduces a new conceptual object (Intent Lenses) and relies on standard HCI assumptions about user intent and LLM capabilities without introducing fitted parameters or ungrounded entities.

axioms (2)
  • domain assumption Users form a specific, inferable intent when opportunistically capturing photos of information-rich scenes
    This premise underpins the entire intent-inference mechanism and is invoked in the design of the lenses.
  • domain assumption Large language models possess sufficient reasoning capabilities to translate photo content into structured note-generation functions
    Central to the dynamic generation of lenses; treated as an external capability rather than derived.
invented entities (1)
  • Intent Lenses no independent evidence
    purpose: Reusable interactive objects that encode capture intent, information focus, and output representation for note generation
    New conceptual primitive introduced to mediate between raw photos and structured notes.

pith-pipeline@v0.9.0 · 5522 in / 1442 out tokens · 61709 ms · 2026-05-10T17:04:19.400959+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

67 extracted references · 57 canonical work pages · 2 internal anchors

  1. [1]

    Michel Beaudouin-Lafon. 2000. Instrumental interaction: an interaction model for designing post-WIMP user interfaces. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(The Hague, The Netherlands)(CHI ’00). Association for Computing Machinery, New York, NY, USA, 446–453. doi:10. 1145/332040.332473

  2. [2]

    Michel Beaudouin-Lafon and Wendy E. Mackay. 2000. Reification, polymorphism and reuse: three principles for designing visual interfaces. InProceedings of the Working Conference on Advanced Visual Interfaces(Palermo, Italy)(A VI ’00). Association for Computing Machinery, New York, NY, USA, 102–109. doi:10. 1145/345513.345267

  3. [3]

    Peter Brandl, Christoph Richter, and Michael Haller. 2010. NiCEBook: supporting natural note taking. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Atlanta, Georgia, USA)(CHI ’10). Association for Computing Machinery, New York, NY, USA, 599–608. doi:10.1145/1753326.1753417

  4. [4]

    Yining Cao, Peiling Jiang, and Haijun Xia. 2025. Generative and Malleable User Interfaces with Generative and Evolving Task-Driven Data Model. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 686, 20 pages. doi:10.1145/3706598.3713285

  5. [5]

    Yining Cao, Hariharan Subramonyam, and Eytan Adar. 2022. VideoSticker: A Tool for Active Viewing and Visual Note-taking from Videos. InProceedings of the 27th International Conference on Intelligent User Interfaces(Helsinki, Finland) (IUI ’22). Association for Computing Machinery, New York, NY, USA, 672–690. doi:10.1145/3490099.3511132

  6. [6]

    Patrick Chiu, Ashutosh Kapuskar, Sarah Reitmeier, and Lynn Wilcox. 1999. NoteLook: taking notes in meetings with digital video and ink. InProceedings of the Seventh ACM International Conference on Multimedia (Part 1)(Orlando, Florida, USA)(MULTIMEDIA ’99). Association for Computing Machinery, New York, NY, USA, 149–158. doi:10.1145/319463.319483

  7. [7]

    Hai Dang, Karim Benharrak, Florian Lehmann, and Daniel Buschek. 2022. Be- yond Text Generation: Supporting Writers with Continuous Automatic Text Sum- maries. InProceedings of the 35th Annual ACM Symposium on User Interface Soft- ware and Technology(Bend, OR, USA)(UIST ’22). Association for Computing Ma- chinery, New York, NY, USA, Article 98, 13 pages. d...

  8. [8]

    Mustafa Doga Dogan, Eric J Gonzalez, Karan Ahuja, Ruofei Du, Andrea Colaço, Johnny Lee, Mar Gonzalez-Franco, and David Kim. 2024. Augmented Object Intelligence with XR-Objects. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology(Pittsburgh, PA, USA)(UIST ’24). As- sociation for Computing Machinery, New York, NY, USA, A...

  9. [9]

    Yufeng Du, Minyang Tian, Srikanth Ronanki, Subendhu Rongali, Sravan Bodap- ati, Aram Galstyan, Azton Wells, Roy Schwartz, Eliu A Huerta, and Hao Peng

  10. [10]

    Context length alone hurts llm performance despite perfect retrieval.arXiv preprint arXiv:2510.05381, 2025

    Context Length Alone Hurts LLM Performance Despite Perfect Retrieval. arXiv:2510.05381 [cs.CL] https://arxiv.org/abs/2510.05381

  11. [11]

    Inner-Voice

    Cathy Mengying Fang, Yasith Samaradivakara, Pattie Maes, and Suranga Nanayakkara. 2025. Mirai: A Wearable Proactive AI "Inner-Voice" for Contextual Nudging. InProceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA ’25). Association for Computing Machin- ery, New York, NY, USA, Article 399, 9 pages. doi:10...

  12. [12]

    Dañiel Gerhardt, Divyanshu Bhardwaj, Ashwin Ram, André Zenner, Jürgen Steimle, and Katharina Krombholz. 2026. Privacy & Safety Challenges of On- Body Interaction Techniques. (2 2026). doi:10.60882/cispa.31409178.v1

  13. [13]

    Frederic Gmeiner, Nicolai Marquardt, Michael Bentley, Hugo Romat, Michel Pahud, David Brown, Asta Roseway, Nikolas Martelaro, Kenneth Holstein, Ken Hinckley, and Nathalie Riche. 2025. Intent Tagging: Exploring Micro-Prompting Interactions for Supporting Granular Human-GenAI Co-Creation Workflows. In Proceedings of the 2025 CHI Conference on Human Factors ...

  14. [14]

    Ken Hinckley, Xiaojun Bi, Michel Pahud, and Bill Buxton. 2012. Informal infor- mation gathering techniques for active reading. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Austin, Texas, USA)(CHI ’12). Association for Computing Machinery, New York, NY, USA, 1893–1896. doi:10.1145/2207676.2208327

  15. [15]

    Ken Hinckley, Shengdong Zhao, Raman Sarin, Patrick Baudisch, Edward Cutrell, Michael Shilman, and Desney Tan. 2007. InkSeine: In Situ search for active note taking. InProceedings of the SIGCHI Conference on Human Factors in Comput- ing Systems(San Jose, California, USA)(CHI ’07). Association for Computing Machinery, New York, NY, USA, 251–260. doi:10.1145...

  16. [16]

    Faria Huq, Abdus Samee, David Chuan-En Lin, Alice Xiaodi Tang, and Jeffrey P Bigham. 2025. NoTeeline: Supporting Real-Time, Personalized Notetaking with LLM-Enhanced Micronotes. InProceedings of the 30th International Conference on Intelligent User Interfaces (IUI ’25). Association for Computing Machinery, New York, NY, USA, 1064–1081. doi:10.1145/3708359.3712086

  17. [17]

    Mi Jiang, Junran Gao, Zeyu Pan, Yue Wu, and Zile Wang. 2025. NexaNota: An AI-Powered Smart Linked Lecture Note-Taking System Leveraging Large Language Models. InProceedings of the 2025 International Conference on Big Data and Informatization Education (ICBDIE ’25). Association for Computing Machinery, New York, NY, USA, 242–248. doi:10.1145/3729605.3729648

  18. [18]

    Dow, and Haijun Xia

    Peiling Jiang, Jude Rayan, Steven P. Dow, and Haijun Xia. 2023. Graphologue: Exploring Large Language Model Responses with Interactive Diagrams. InPro- ceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA)(UIST ’23). Association for Computing Ma- chinery, New York, NY, USA, Article 3, 20 pages. doi:10....

  19. [19]

    Hita Kambhamettu, Jamie Flores, and Andrew Head. 2025. Traceable Texts and Their Effects: A Study of Summary-Source Links in AI-Generated Summaries. In Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA ’25). Association for Computing Machinery, New York, NY, USA, Article 538, 7 pages. doi:10.1145/370...

  20. [20]

    Tae Soo Kim, Yoonjoo Lee, Minsuk Chang, and Juho Kim. 2023. Cells, Gen- erators, and Lenses: Design Framework for Object-Oriented Interaction with Large Language Models. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA)(UIST ’23). Association for Computing Machinery, New York, NY, USA, Article...

  21. [21]

    Yoonsang Kim, Devshree Jadeja, Divyansh Pradhan, Yalong Yang, and Arie Kaufman. 2026. SpeechLess: Micro-utterance with Personalized Spatial Memory- aware Assistant in Everyday Augmented Reality.arXiv preprint arXiv:2602.00793 (2026)

  22. [22]

    Andrew Kuznetsov, Joseph Chee Chang, Nathan Hahn, Napol Rachatasumrit, Bradley Breneisen, Julina Coupland, and Aniket Kittur. 2022. Fuse: In-Situ Sensemaking Support in the Browser. InProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology(Bend, OR, USA)(UIST ’22). Association for Computing Machinery, New York, NY, USA, Arti...

  23. [24]

    Rodriguez, and Jon E

    Jaewook Lee, Jun Wang, Elizabeth Brown, Liam Chu, Sebastian S. Rodriguez, and Jon E. Froehlich. 2024. GazePointAR: A Context-Aware Multimodal Voice Assis- tant for Pronoun Disambiguation in Wearable Augmented Reality. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machi...

  24. [25]

    Chenyi Li, Guande Wu, Gromit Yeuk-Yin Chan, Dishita Gdi Turakhia, Sonia Castelo Quispe, Dong Li, Leslie Welch, Claudio Silva, and Jing Qian. 2025. Satori: Towards Proactive AR Assistant with Belief-Desire-Intention User Modeling. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery,...

  25. [26]

    Daniel Li, Thomas Chen, Albert Tung, and Lydia B Chilton. 2021. Hierarchi- cal Summarization for Longform Spoken Dialog. InThe 34th Annual ACM Symposium on User Interface Software and Technology(Virtual Event, USA) (UIST ’21). Association for Computing Machinery, New York, NY, USA, 582–597. doi:10.1145/3472749.3474771

  26. [27]

    Jiahao Nick Li, Yan Xu, Tovi Grossman, Stephanie Santosa, and Michelle Li. 2024. OmniActions: Predicting Digital Actions in Response to Real-World Multimodal Sensory Inputs with LLMs. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 8...

  27. [28]

    Jiahao Nick Li, Zhuohao (Jerry) Zhang, and Jiaju Ma. 2025. OmniQuery: Contex- tually Augmenting Captured Multimodal Memories to Enable Personal Question Answering. InProceedings of the 2025 CHI Conference on Human Factors in Com- puting Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 635, 20 pages. doi:10.1145/3706598.3713448

  28. [29]

    Michael Xieyang Liu, Tongshuang Wu, Tianying Chen, Franklin Mingzhe Li, Aniket Kittur, and Brad A Myers. 2024. Selenite: Scaffolding Online Sensemak- ing with Comprehensive Overviews Elicited from Large Language Models. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA)(CHI ’24). Association for Computing M...

  29. [30]

    Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

    Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics 12 (2024), 157–173. doi:10.1162/tacl_a_00638

  30. [31]

    Damien Masson, Sylvain Malacria, Géry Casiez, and Daniel Vogel. 2024. Direct- GPT: A Direct Manipulation Interface to Interact with Large Language Models. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 975, 16 pages. doi:10.1145/36...

  31. [32]

    Xiaojun Meng, Shengdong Zhao, and Darren Edge. 2016. HyNote: Integrated Con- cept Mapping and Notetaking. InProceedings of the International Working Con- ference on Advanced Visual Interfaces(Bari, Italy)(A VI ’16). Association for Com- puting Machinery, New York, NY, USA, 236–239. doi:10.1145/2909132.2909277

  32. [33]

    Bryan Min and Haijun Xia. 2025. Meridian: A Design Framework for Malleable Overview-Detail Interfaces. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery, New York, NY, USA, Article 200, 14 pages. doi:10.1145/3746059. 3747654

  33. [34]

    Rawson, Rachael Blasiman, and R

    Kayla Morehead, John Dunlosky, Katherine A. Rawson, Rachael Blasiman, and R. Benjamin Hollis. 2019. Note-taking habits of 21st Century college students: implications for student learning, memory, and achievement.Memory27, 6 (2019), 807–819. arXiv:https://doi.org/10.1080/09658211.2019.1569694 doi:10. 1080/09658211.2019.1569694 PMID: 30747570

  34. [35]

    Olsen, Trent Taufer, and Jerry Alan Fails

    Dan R. Olsen, Trent Taufer, and Jerry Alan Fails. 2004. ScreenCrayons: annotat- ing anything. InProceedings of the 17th Annual ACM Symposium on User Interface Software and Technology(Santa Fe, NM, USA)(UIST ’04). Association for Com- puting Machinery, New York, NY, USA, 165–174. doi:10.1145/1029632.1029663

  35. [36]

    Annie Piolat, Thierry Olive, and Ronald T. Kellogg. 2005. Cognitive ef- fort during note taking.Applied Cognitive Psychology19, 3 (2005), 291–312. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/acp.1086 doi:10.1002/acp. 1086

  36. [37]

    Kevin Pu, Ting Zhang, Naveen Sendhilnathan, Sebastian Freitag, Raj Sodhi, and Tanya R. Jonker. 2025. ProMemAssist: Exploring Timely Proactive Assistance Through Working Memory Modeling in Multi-Modal Wearable Devices. InPro- ceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery,...

  37. [38]

    Wanli Qian, Chenfeng Gao, Anup Sathya, Ryo Suzuki, and Ken Nakagaki. 2024. SHAPE-IT: Exploring Text-to-Shape-Display for Generative Shape-Changing Behaviors with LLMs. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology(Pittsburgh, PA, USA)(UIST ’24). Association for Computing Machinery, New York, NY, USA, Article 118,...

  38. [39]

    Leping Qiu, Erin Seongyoon Kim, Sangho Suh, Ludwig Sidenmark, and Tovi Grossman. 2025. MaRginalia: Enabling In-person Lecture Capturing and Note- taking Through Mixed Reality. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 141, 15 pages. doi:10.1145/...

  39. [40]

    Hollan, and Peter Dalsgaard

    Emilia Rosselli Del Turco, Nanna Inie, James D. Hollan, and Peter Dalsgaard

  40. [41]

    Comput.-Hum

    How Creative Practitioners Use Tools to Capture Ideas: A Cross-Domain Study.ACM Trans. Comput.-Hum. Interact.32, 4, Article 40 (Aug. 2025), 38 pages. doi:10.1145/3727979

  41. [42]

    Nirmal Roy, Manuel Valle Torre, Ujwal Gadiraju, David Maxwell, and Clau- dia Hauff. 2021. Note the Highlight: Incorporating Active Reading Tools in a Search as Learning Environment. InProceedings of the 2021 Conference on Hu- man Information Interaction and Retrieval(Canberra ACT, Australia)(CHIIR ’21). Association for Computing Machinery, New York, NY, U...

  42. [43]

    Russell, Mark J

    Daniel M. Russell, Mark J. Stefik, Peter Pirolli, and Stuart K. Card. 1993. The cost structure of sensemaking. InProceedings of the INTERACT ’93 and CHI ’93 Con- ference on Human Factors in Computing Systems(Amsterdam, The Netherlands) (CHI ’93). Association for Computing Machinery, New York, NY, USA, 269–276. doi:10.1145/169059.169209

  43. [44]

    Bernstein

    Omar Shaikh, Shardul Sapkota, Shan Rizvi, Eric Horvitz, Joon Sung Park, Diyi Yang, and Michael S. Bernstein. 2025. Creating General User Models from Com- puter Use. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery, New York, NY, USA, Article 35, 23 pages. doi:10.1145/3...

  44. [45]

    Hijung Valentina Shin, Floraine Berthouzoz, Wilmot Li, and Frédo Durand. 2015. Visual transcripts: lecture notes from blackboard-style lecture videos.ACM Trans. Graph.34, 6, Article 240 (Nov. 2015), 10 pages. doi:10.1145/2816795.2818123

  45. [46]

    Hariharan Subramonyam, Colleen Seifert, Priti Shah, and Eytan Adar. 2020. texS- ketch: Active Diagramming through Pen-and-Ink Annotations. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3313831.3376155

  46. [47]

    Dow, and Tovi Grossman

    Sangho Suh, Michael Lai, Kevin Pu, Steven P. Dow, and Tovi Grossman. 2025. StoryEnsemble: Enabling Dynamic Exploration & Iteration in the Design Process with AI and Forward-Backward Propagation. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery, New York, NY, USA, Artic...

  47. [48]

    Sangho Suh, Bryan Min, Srishti Palani, and Haijun Xia. 2023. Sensecape: En- abling Multilevel Exploration and Sensemaking with Large Language Models. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology(San Francisco, CA, USA)(UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 1, 18 pages. doi:10...

  48. [49]

    Bennett, and Jaime Teevan

    Amanda Swearngin, Shamsi Iqbal, Victor Poznanski, Mark Encarnación, Paul N. Bennett, and Jaime Teevan. 2021. Scraps: Enabling Mobile Capture, Contex- tualization, and Use of Document Resources. InProceedings of the 2021 CHI Conference on Human Factors in Computing Systems(Yokohama, Japan)(CHI ’21). Association for Computing Machinery, New York, NY, USA, A...

  49. [50]

    Tashman and W

    Craig S. Tashman and W. Keith Edwards. 2011. LiquidText: a flexible, multitouch environment to support active reading. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Vancouver, BC, Canada)(CHI ’11). Association for Computing Machinery, New York, NY, USA, 3285–3294. doi:10. 1145/1978942.1979430

  50. [51]

    Amrita Thakur, Michael Gormish, and Berna Erol. 2011. Mobile phones and information capture in the workplace. InCHI ’11 Extended Abstracts on Human Factors in Computing Systems(Vancouver, BC, Canada)(CHI EA ’11). Association for Computing Machinery, New York, NY, USA, 1513–1518. doi:10.1145/1979742. 1979800

  51. [52]

    Hsin-Ruey Tsai, Shih-Kang Chiu, and Bryan Wang. 2025. GazeNoter: Co-Piloted AR Note-Taking via Gaze Selection of LLM Suggestions to Match Users’ Inten- tions. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, Article 32, 22 pages. doi:10.1145/3706598.3714294

  52. [53]

    Sarah Shi Hui Wong and Stephen Wee Hun Lim. 2023. Take notes, not photos: Mind-wandering mediates the impact of note-taking strategies on video-recorded lecture learning performance.Journal of Experimental Psychology: Applied29, 1 (2023), 124

  53. [54]

    Haijun Xia, Bruno Araujo, Tovi Grossman, and Daniel Wigdor. 2016. Object- Oriented Drawing. InProceedings of the 2016 CHI Conference on Human Factors in Computing Systems(San Jose, California, USA)(CHI ’16). Association for Computing Machinery, New York, NY, USA, 4610–4621. doi:10.1145/2858036. 2858075

  54. [55]

    Haijun Xia, Nathalie Henry Riche, Fanny Chevalier, Bruno De Araujo, and Daniel Wigdor. 2018. DataInk: Direct and Creative Data-Oriented Drawing. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada)(CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–13. doi:10.1145/3173574.3173797

  55. [56]

    Chengpei Xu, Wenjing Jia, Ruomei Wang, Xiangjian He, Baoquan Zhao, and Yuanfang Zhang. 2023. Semantic Navigation of PowerPoint-Based Lecture Video for AutoNote Generation.IEEE Transactions on Learning Technologies16, 1 (2023), 1–17. doi:10.1109/TLT.2022.3216535

  56. [57]

    Chenyang Yang, Yike Shi, Qianou Ma, Michael Xieyang Liu, Christian Käst- ner, and Tongshuang Wu. 2025. What Prompts Don’t Say: Understanding and Managing Underspecification in LLM Prompts. arXiv:2505.13360 [cs.CL] https://arxiv.org/abs/2505.13360

  57. [58]

    Ron Yeh, Chunyuan Liao, Scott Klemmer, François Guimbretière, Brian Lee, Boyko Kakaradov, Jeannie Stamberger, and Andreas Paepcke. 2006. ButterflyNet: a mobile capture and access system for field biology research. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems(Montréal, Québec, Canada)(CHI ’06). Association for Computing Mach...

  58. [59]

    Min-Ling Zhang and Zhi-Hua Zhou. 2014. A Review on Multi-Label Learning Algorithms.IEEE Transactions on Knowledge and Data Engineering26, 8 (2014), 1819–1837. doi:10.1109/TKDE.2013.39

  59. [60]

    Running Zhao, Zhihan Jiang, Xinchen Zhang, Chirui Chang, Handi Chen, Weipeng Deng, Luyao Jin, Xiaojuan Qi, Xun Qian, and Edith C.H. Ngai. 2025. NoteIt: A System Converting Instructional Videos to Interactable Notes Through Multimodal Video Understanding. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). A...

  60. [61]

    Wazeer Deen Zulfikar, Samantha Chan, and Pattie Maes. 2024. Memoro: Using Large Language Models to Realize a Concise Interface for Real-Time Memory Augmentation. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems(Honolulu, HI, USA)(CHI ’24). Association for Computing Machinery, New York, NY, USA, Article 450, 18 pages. doi:10.1...

  61. [62]

    Information Density & Cognitive Load Is the slide dense , technical , or hard to parse quickly ? Does it contain equations , multiple plots , diagrams , or many concepts at once ? Dense slides are often captured to revisit or study later rather than to remember a single message

  62. [63]

    Role in Research Workflow Would this slide help the user later when writing , designing , or positioning their own work ? Is it more useful for context - setting , evidence , inspiration , or technical reference ?

  63. [64]

    Ownership of Content Does the slide summarize prior work ( external citations , older papers ) ? Or does it present the speaker's core contribution or vision ? Slides about others'work are often captured for literature mapping ; slides about contributions are often captured for conceptual understanding

  64. [65]

    Type of Knowledge Captured Is the knowledge : Conceptual ( ideas , framing , agenda ) ? Empirical ( results , benchmarks , performance ) ? Procedural ( methods , fabrication , pipeline ) ? Speculative ( limitations , future work , open questions ) ?

  65. [66]

    look later ,

    Likely Annotation Behavior Would a user label this " look later ," " important ," " reference ," or " idea "? Is this something they would quote , compare against , or build on ?

  66. [67]

    Slide Position in Talk Early ( motivation / problem framing ) ? Middle ( methods / results ) ? Late ( discussion / future work / agenda ) ? Slides later in the talk are more often captured for inspiration or direction rather than understanding

  67. [68]

    Summary / Overview

    Decide the Primary Intent Choose one clear intent , not a mixture . Examples include ( but are not limited to ) : Summarize core idea Track related work Capture empirical evidence Understand methodology Note research agenda / vision Identify future directions Collect references Design inspiration Mark title slides , section title slides , or similar slide...