pith. sign in

arxiv: 2606.05040 · v1 · pith:BJKAOLZInew · submitted 2026-06-03 · 💻 cs.IR

SearchLog: A Web Browser Extension for Capturing Search Logs in Laboratory Studies

Pith reviewed 2026-06-28 03:43 UTC · model grok-4.3

classification 💻 cs.IR
keywords search logsbrowser extensionlaboratory studiesinformation seekinguser interactionweb searchAI summaries
0
0 comments X

The pith

SearchLog is a browser extension that captures mouse, keyboard, search query, and browser state data while users conduct open-web searches in lab studies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SearchLog as a tool that lets lab participants search the real web inside their browser while the extension records structured events. It logs clicks, scrolls, typed text, queries, result rankings, AI summaries, tab switches, and window changes, then saves them locally as ordered JSON streams with HTML snapshots. A sympathetic reader would care because existing methods either simulate search or lose natural behavior, and this extension aims to combine ecological validity with precise, reusable data for measuring reformulation, dwell time, and path complexity.

Core claim

SearchLog allows participants to search the open web using a browser while recording structured interaction data across mouse, keyboard, search activity, and browser state modules. The extension captures clicks, scrolling, hovered text, typed words, search queries, result rankings, AI-generated summaries when available, tab activity, and window changes. A local Flask backend stores each session as an ordered JSON event stream, with HTML snapshots and preprocessed search result data for later analysis.

What carries the argument

SearchLog browser extension with its four capture modules (mouse, keyboard, search activity, browser state) feeding a local Flask backend that writes ordered JSON event streams.

If this is right

  • Researchers can compute query reformulation rates, page visit sequences, dwell times, scroll depths, tab-switching patterns, and exposure to AI-generated summaries from the same log files.
  • The same extension supports experiments on both traditional result lists and AI-enhanced search interfaces without changing the participant setup.
  • Session metadata can be linked to experimental conditions, enabling controlled comparisons across user groups or interface variants.
  • The reusable extension reduces the need for custom logging code in future lab studies of information seeking.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The structured JSON format could be fed directly into automated analysis pipelines that detect complex search strategies without manual coding.
  • If the extension proves stable, labs could run multi-session studies where participants keep the tool installed across days or weeks.
  • The capture of AI summaries alongside user actions opens direct measurement of how generated content affects click and reading behavior.

Load-bearing premise

Participants will install and use the extension without changing how they normally search the web, and the local storage will record every event without loss or distortion.

What would settle it

A side-by-side comparison in which the same participants perform identical search tasks with and without SearchLog installed, showing measurable shifts in query length, dwell time, or tab-switching rate, or logs that omit events visible in independent screen recordings.

Figures

Figures reproduced from arXiv: 2606.05040 by Damiano Spina, Dana McKay, Jiaman He, Johanne R. Trippas, Riccardo Xia.

Figure 1
Figure 1. Figure 1: Workflow of SearchLog. After the researcher starts a session, participants search naturally in the web browser while the extension records search and browser interactions. The local server saves the event logs and snapshots for later analysis. Session ID Timestamp Event Window Tab Mouse Search Keyboard Click Mouse move Scroll Write Ranking Session Rank Type AI summary Text result Title Link Description Sea… view at source ↗
Figure 3
Figure 3. Figure 3: shows an example event log. Detailed installation and usage instructions are provided in the GitHub repository. { "session_id": "1779009704_a1cb7ffe", "timestamp": 1779009711393, "event": "keyboard", "action": "write", "typed": "how solar panels work" }, { "session_id": "1779009704_a1cb7ffe", "timestamp": 1779009712330, "event": "search", "search_engine": "google", "query": "how solar panels work", "filena… view at source ↗
Figure 2
Figure 2. Figure 2: Structure of the data stored by SearchLog. Before a study session, the researcher installs the SearchLog Chromium extension and starts the local server on the experiment machine. The researcher then starts logging through the extension dialog, while participants use the browser naturally during the search task. After the task, the researcher stops logging through the extension dialog. Each session is saved… view at source ↗
read the original abstract

Natural search logs are valuable for studying search behavior in information seeking settings. We present SearchLog, an easy-to-install web browser extension for collecting natural search logs during lab-based studies. SearchLog allows participants to search the open web using a browser while recording structured interaction data across mouse, keyboard, search activity, and browser state modules. The extension captures clicks, scrolling, hovered text, typed words, search queries, result rankings, AI-generated summaries when available, tab activity, and window changes. A local Flask backend stores each session as an ordered JSON event stream, with HTML snapshots and preprocessed search result data for later analysis. These logs can be used to derive measures such as query reformulation, page visits, dwell time, scroll behavior, tab switching, search path complexity, and exposure to AI-generated search content. By supporting natural browser-based search with structured experimental metadata, SearchLog provides a reusable resource to study search behavior across traditional and AI-enhanced search interfaces.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper presents SearchLog, a web browser extension for laboratory studies that allows participants to perform open-web searches while capturing structured interaction data via mouse, keyboard, search activity, and browser state modules. Captured elements include clicks, scrolling, hovered text, typed words, queries, result rankings, AI-generated summaries, tab activity, and window changes. Sessions are stored as ordered JSON event streams by a local Flask backend, with HTML snapshots and preprocessed results, enabling derivation of measures such as query reformulation, dwell time, scroll behavior, tab switching, search path complexity, and exposure to AI content.

Significance. If the described functionality holds, SearchLog provides a reusable, installable tool for collecting natural search logs in controlled lab settings without restricting participants to simulated environments. This is significant for information retrieval research, as it supports detailed analysis of user behavior across traditional and AI-enhanced interfaces and facilitates reproducible experiments on metrics that are otherwise difficult to obtain from proprietary logs.

major comments (1)
  1. [Abstract] Abstract: The central claim that the extension reliably captures all listed data types (mouse/keyboard events, search queries, result rankings, AI summaries, tab/window changes) as ordered JSON without loss or distortion is presented without any code, validation data, error-handling details, or empirical tests, leaving the implementation claim unverified.
minor comments (2)
  1. The manuscript would benefit from explicit discussion of how the content scripts and Flask backend handle edge cases such as network latency or browser updates to ensure data integrity.
  2. Installation and setup instructions, or a pointer to a public code repository, are missing and would improve the tool's reusability as claimed.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting the need to substantiate the implementation claims. We address the comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the extension reliably captures all listed data types (mouse/keyboard events, search queries, result rankings, AI summaries, tab/window changes) as ordered JSON without loss or distortion is presented without any code, validation data, error-handling details, or empirical tests, leaving the implementation claim unverified.

    Authors: We agree that the abstract asserts reliable capture of the listed data types without supporting evidence or details on validation. The manuscript describes the modular architecture (mouse/keyboard, search activity, browser state) and the Flask backend for ordered JSON storage, but does not include empirical tests, error-handling specifics, or sample validation data. We will revise the abstract to use more precise language (e.g., 'captures' rather than 'reliably captures ... without loss or distortion') and add a dedicated subsection on implementation validation. This will include: (1) description of event listeners and error handling for each module, (2) example JSON event streams from pilot sessions, and (3) discussion of known limitations such as potential loss during high-frequency events or browser restrictions. If space allows, we will include pseudocode for key capture functions. revision: yes

Circularity Check

0 steps flagged

No significant circularity: tool-description paper with no derivations or predictions

full rationale

The manuscript is a straightforward description of a browser extension artifact and its data-capture modules. No equations, fitted parameters, predictions, uniqueness theorems, or self-citation chains appear in the abstract or described content. All claims concern implementable features (event logging, JSON storage, HTML snapshots) that are internally consistent with standard browser-extension APIs and require no external derivation or reduction to prior fitted results. The contribution is therefore self-contained as an engineering artifact without any load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a software tool presentation paper with no mathematical content, free parameters, axioms, or invented scientific entities.

pith-pipeline@v0.9.1-grok · 5710 in / 1098 out tokens · 50635 ms · 2026-06-28T03:43:05.064757+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 16 canonical work pages

  1. [1]

    Khan, and Zhiwei Guan

    Anne Aula, Rehan M. Khan, and Zhiwei Guan. 2010. How Does Search Behav- ior Change as Search Becomes More Difficult?. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’10). ACM, 35–44. doi:10.1145/1753326.1753333

  2. [2]

    Nicholas J Belkin. 1980. Anomalous States of Knowledge as A Basis for Informa- tion Retrieval.Canadian journal of information science5, 1 (1980), 133–143

  3. [3]

    Nicholas J. Belkin. 2008. Some(what) Grand Challenges for Information Retrieval. SIGIR Forum42, 1 (June 2008), 47–54. doi:10.1145/1394251.1394261

  4. [4]

    Nilavra Bhattacharya and Jacek Gwizdka. 2021. YASBIL: Yet another search behaviour (and) interaction logger. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2585– 2589

  5. [5]

    Robert Capra. 2009. HCI Browser: A Tool for Studying Web Search Behavior. In Proceedings of the Workshop on Understanding the User: Logging and Interpreting User Interactions in Information Search and Retrieval (UIIR ’09). 38–41

  6. [6]

    Robert Capra. 2010. HCI browser: A tool for studying web search behavior. Proceedings of the American Society for Information Science and Technology47, 1 (2010), 1–2

  7. [7]

    Robert Capra. 2011. HCI Browser: A Tool for Administration and Data Collection for Studies of Web Information Seeking. InHuman-Computer Interaction. Users and Applications. Lecture Notes in Computer Science, Vol. 6764. Springer, 189–198. doi:10.1007/978-3-642-21708-1_30

  8. [8]

    Michael J Cole, Chathra Hendahewa, Nicholas J Belkin, and Chirag Shah. 2015. User activity patterns during information search.ACM Transactions on Informa- tion Systems (TOIS)33, 1 (2015), 1–39

  9. [9]

    Sharad Goel, Jake M Hofman, Sébastien Lahaie, David M Pennock, and Duncan J Watts. 2010. Predicting consumer behavior with Web search.Proceedings of the National academy of sciences107, 41 (2010), 17486–17490

  10. [10]

    Granka, Thorsten Joachims, and Geri Gay

    Laura A. Granka, Thorsten Joachims, and Geri Gay. 2004. Eye-Tracking Analysis of User Behavior in WWW Search. InProceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’04). ACM, 478–479. doi:10.1145/1008992.1009079

  11. [11]

    Jiaman He. 2026. User Search Behavior and Knowledge Effects in Hybrid Search Environment. InProceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Melbourne, Australia. doi:10.1145/3805712.3808356

  12. [12]

    Jiaman He, Zikang Leng, Dana McKay, Johanne R Trippas, and Damiano Spina

  13. [13]

    InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval

    Characterising Topic Familiarity and Query Specificity Using Eye-Tracking Data. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2602–2606

  14. [14]

    Jiaman He, Marta Micheli, Damiano Spina, Dana McKay, Johanne R Trippas, and Noriko Kando. 2026. Characterizing Personality from Eye-Tracking: The Role of Gaze and Its Absence in Interactive Search Environments. InProceedings of the 2026 Conference on Human Information Interaction and Retrieval. 193–203

  15. [15]

    Bernard J Jansen. 2006. Search log analysis: What it is, what’s been done, how to do it.Library & information science research28, 3 (2006), 407–432

  16. [16]

    Bernard J Jansen and Amanda Spink. 2006. How are we searching the World Wide Web? A comparison of nine search engine transaction logs.Information processing & management42, 1 (2006), 248–263

  17. [17]

    Jansen, Amanda Spink, and Tefko Saracevic

    Bernard J. Jansen, Amanda Spink, and Tefko Saracevic. 2000. Real Life, Real Users, and Real Needs: A Study and Analysis of User Queries on the Web.Information Processing & Management36, 2 (2000), 207–227. doi:10.1016/S0306-4573(99)00056- 4

  18. [18]

    Salim, Falk Scholer, and Damiano Spina

    Kaixin Ji, Danula Hettiachchi, Flora D. Salim, Falk Scholer, and Damiano Spina

  19. [19]

    InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)

    Characterizing Information Seeking Processes with Multiple Physiological Signals. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(Washington DC, USA)(SIGIR ’24). Association for Computing Machinery, New York, NY, USA, 1006–1017. doi:10. 1145/3626772.3657793

  20. [20]

    Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay. 2005. Accurately Interpreting Clickthrough Data as Implicit Feedback. InProceedings of the 28th Annual International ACM SIGIR Conference on Re- search and Development in Information Retrieval (SIGIR ’05). ACM, 154–161. doi:10.1145/1076034.1076063

  21. [21]

    Dmitry Lagun, Chih-Hung Hsieh, Dale Webster, and Vidhya Navalpakkam. 2014. Towards Better Measurement of Attention and Satisfaction in Mobile Search. InProceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR ’14). ACM, 113–122. doi:10.1145/ 2600428.2609631

  22. [22]

    Huffman, and Akihito Tokuda

    Jane Li, Scott B. Huffman, and Akihito Tokuda. 2009. Good Abandonment in Mobile and PC Internet Search. InProceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’09). ACM, 43–50. doi:10.1145/1571941.1571951

  23. [23]

    Yidong Liang, Zhijing Wu, Yuchen He, Fengming Liang, Kexin Liu, and Jiaxin Mao. 2025. A Flexible User Study Platform for Generative Information Retrieval. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’25). ACM, 4066–4070. doi:10.1145/ 3726302.3730140

  24. [24]

    Jingjing Liu, Chang Liu, and Nicholas J Belkin. 2016. Predicting information searchers’ topic knowledge at different search stages.Journal of the Association for Information Science and Technology67, 11 (2016), 2652–2666

  25. [25]

    Mengyang Liu, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2019. In- vestigating Cognitive Effects in Session-level Search User Satisfaction. InPro- ceedings of the 25th ACM SIGKDD International Conference on Knowledge Discov- ery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019. 923–931. doi:10.1145/3292500.3330981

  26. [26]

    Ammar Makhlouf, Yutaka Arakawa, and Yutaka Watanabe. 2022. A Privacy- Aware Browser Extension to Track User Search Behavior for Programming Learn- ing. InMobile and Ubiquitous Systems: Computing, Networking and Services (Mo- biQuitous 2021). Springer, 684–694. doi:10.1007/978-3-030-94822-1_51

  27. [27]

    Jihed Makhlouf, Yutaka Arakawa, and Ko Watanabe. 2021. A privacy-aware browser extension to track user search behavior for programming course supple- ment. InInternational Conference on Mobile and Ubiquitous Systems: Computing, Networking, and Services. Springer, 783–796

  28. [28]

    1995.Information Seeking in Electronic Environments

    Gary Marchionini. 1995.Information Seeking in Electronic Environments. Cam- bridge university press

  29. [29]

    David Maxwell and Claudia Hauff. 2021. LogUI: contemporary logging infrastruc- ture for web-based experiments. InEuropean Conference on Information Retrieval. Springer, 525–530

  30. [30]

    Dan Morris, Meredith Ringel Morris, and Gina Venolia. 2008. SearchBar: a search- centric web history for task resumption and information re-finding. InProceedings of the SIGCHI conference on human factors in computing systems. 1207–1216

  31. [31]

    Srishti Palani and Steven P. Dow. 2025. Contextualizing the Role of Web Search in Creative Workflows: Insights from a Longitudinal Study. InProceedings of the 2025 ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR ’25). ACM

  32. [32]

    Pernilla Qvarfordt, Simon Tretter, Gene Golovchinsky, and Tony Dunnigan

  33. [33]

    InProceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval

    Searchpanel: framing complex search needs. InProceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. 495–504

  34. [34]

    Craig Silverstein, Monika Henzinger, Hannes Marais, and Michael Moricz. 1999. Analysis of a Very Large Web Search Engine Query Log.SIGIR Forum33, 1 (1999), 6–12

  35. [35]

    Georg Singer, Ulrich Norbisrath, Eero Vainikko, Hannu Kikkas, and Dirk Lewandowski. 2011. Search-logger analyzing exploratory search tasks. InPro- ceedings of the 2011 ACM Symposium on Applied Computing. 751–756

  36. [36]

    Kelsey Urgo and Jaime Arguello. 2025. Search as learning.Foundations and Trends®in Information Retrieval19, 4 (2025), 365–556

  37. [37]

    Tung Vuong, Miamaria Saastamoinen, Giulio Jacucci, and Tuukka Ruotsalo. 2019. Understanding User Behavior in Naturalistic Information Search Tasks.Journal of the Association for Information Science and Technology70, 11 (2019), 1248–1261. doi:10.1002/asi.24201

  38. [38]

    Xuanhui Wang and ChengXiang Zhai. 2007. Learn from web search logs to organize search results. InProceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 87–94

  39. [39]

    Austin R Ward and Robert Capra. 2021. OrgBox: Supporting cognitive and metacognitive activities during exploratory search. InProceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 2570–2574

  40. [40]

    Claire Wardle, Shaydanay Urbani, and Eric Wang. 2025. Evolving Health Information–Seeking Behavior in the Context of Google AI Overviews, ChatGPT, and Alexa: Interview Study Using the Think-Aloud Protocol.Journal of Medical Internet Research27 (2025), e79961

  41. [41]

    Barla Cambazoglu, W

    Zhijing Wu, Mark Sanderson, B. Barla Cambazoglu, W. Bruce Croft, and Falk Scholer. 2020. Providing Direct Answers in Search Results: A Study of User Behavior. InProceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM ’20). ACM, 1635–1644. doi:10.1145/3340531. 3412017

  42. [42]

    Junhao Zhang and Haiming Liu. 2025. Theory-Based User Search Behaviour Modelling and Understanding through Search Log Analysis. InProceedings of the 2025 ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR ’25). ACM. doi:10.1145/3698204.3716462