SearchLog: A Web Browser Extension for Capturing Search Logs in Laboratory Studies
Pith reviewed 2026-06-28 03:43 UTC · model grok-4.3
The pith
SearchLog is a browser extension that captures mouse, keyboard, search query, and browser state data while users conduct open-web searches in lab studies.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SearchLog allows participants to search the open web using a browser while recording structured interaction data across mouse, keyboard, search activity, and browser state modules. The extension captures clicks, scrolling, hovered text, typed words, search queries, result rankings, AI-generated summaries when available, tab activity, and window changes. A local Flask backend stores each session as an ordered JSON event stream, with HTML snapshots and preprocessed search result data for later analysis.
What carries the argument
SearchLog browser extension with its four capture modules (mouse, keyboard, search activity, browser state) feeding a local Flask backend that writes ordered JSON event streams.
If this is right
- Researchers can compute query reformulation rates, page visit sequences, dwell times, scroll depths, tab-switching patterns, and exposure to AI-generated summaries from the same log files.
- The same extension supports experiments on both traditional result lists and AI-enhanced search interfaces without changing the participant setup.
- Session metadata can be linked to experimental conditions, enabling controlled comparisons across user groups or interface variants.
- The reusable extension reduces the need for custom logging code in future lab studies of information seeking.
Where Pith is reading between the lines
- The structured JSON format could be fed directly into automated analysis pipelines that detect complex search strategies without manual coding.
- If the extension proves stable, labs could run multi-session studies where participants keep the tool installed across days or weeks.
- The capture of AI summaries alongside user actions opens direct measurement of how generated content affects click and reading behavior.
Load-bearing premise
Participants will install and use the extension without changing how they normally search the web, and the local storage will record every event without loss or distortion.
What would settle it
A side-by-side comparison in which the same participants perform identical search tasks with and without SearchLog installed, showing measurable shifts in query length, dwell time, or tab-switching rate, or logs that omit events visible in independent screen recordings.
Figures
read the original abstract
Natural search logs are valuable for studying search behavior in information seeking settings. We present SearchLog, an easy-to-install web browser extension for collecting natural search logs during lab-based studies. SearchLog allows participants to search the open web using a browser while recording structured interaction data across mouse, keyboard, search activity, and browser state modules. The extension captures clicks, scrolling, hovered text, typed words, search queries, result rankings, AI-generated summaries when available, tab activity, and window changes. A local Flask backend stores each session as an ordered JSON event stream, with HTML snapshots and preprocessed search result data for later analysis. These logs can be used to derive measures such as query reformulation, page visits, dwell time, scroll behavior, tab switching, search path complexity, and exposure to AI-generated search content. By supporting natural browser-based search with structured experimental metadata, SearchLog provides a reusable resource to study search behavior across traditional and AI-enhanced search interfaces.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents SearchLog, a web browser extension for laboratory studies that allows participants to perform open-web searches while capturing structured interaction data via mouse, keyboard, search activity, and browser state modules. Captured elements include clicks, scrolling, hovered text, typed words, queries, result rankings, AI-generated summaries, tab activity, and window changes. Sessions are stored as ordered JSON event streams by a local Flask backend, with HTML snapshots and preprocessed results, enabling derivation of measures such as query reformulation, dwell time, scroll behavior, tab switching, search path complexity, and exposure to AI content.
Significance. If the described functionality holds, SearchLog provides a reusable, installable tool for collecting natural search logs in controlled lab settings without restricting participants to simulated environments. This is significant for information retrieval research, as it supports detailed analysis of user behavior across traditional and AI-enhanced interfaces and facilitates reproducible experiments on metrics that are otherwise difficult to obtain from proprietary logs.
major comments (1)
- [Abstract] Abstract: The central claim that the extension reliably captures all listed data types (mouse/keyboard events, search queries, result rankings, AI summaries, tab/window changes) as ordered JSON without loss or distortion is presented without any code, validation data, error-handling details, or empirical tests, leaving the implementation claim unverified.
minor comments (2)
- The manuscript would benefit from explicit discussion of how the content scripts and Flask backend handle edge cases such as network latency or browser updates to ensure data integrity.
- Installation and setup instructions, or a pointer to a public code repository, are missing and would improve the tool's reusability as claimed.
Simulated Author's Rebuttal
We thank the referee for highlighting the need to substantiate the implementation claims. We address the comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the extension reliably captures all listed data types (mouse/keyboard events, search queries, result rankings, AI summaries, tab/window changes) as ordered JSON without loss or distortion is presented without any code, validation data, error-handling details, or empirical tests, leaving the implementation claim unverified.
Authors: We agree that the abstract asserts reliable capture of the listed data types without supporting evidence or details on validation. The manuscript describes the modular architecture (mouse/keyboard, search activity, browser state) and the Flask backend for ordered JSON storage, but does not include empirical tests, error-handling specifics, or sample validation data. We will revise the abstract to use more precise language (e.g., 'captures' rather than 'reliably captures ... without loss or distortion') and add a dedicated subsection on implementation validation. This will include: (1) description of event listeners and error handling for each module, (2) example JSON event streams from pilot sessions, and (3) discussion of known limitations such as potential loss during high-frequency events or browser restrictions. If space allows, we will include pseudocode for key capture functions. revision: yes
Circularity Check
No significant circularity: tool-description paper with no derivations or predictions
full rationale
The manuscript is a straightforward description of a browser extension artifact and its data-capture modules. No equations, fitted parameters, predictions, uniqueness theorems, or self-citation chains appear in the abstract or described content. All claims concern implementable features (event logging, JSON storage, HTML snapshots) that are internally consistent with standard browser-extension APIs and require no external derivation or reduction to prior fitted results. The contribution is therefore self-contained as an engineering artifact without any load-bearing circular steps.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Anne Aula, Rehan M. Khan, and Zhiwei Guan. 2010. How Does Search Behav- ior Change as Search Becomes More Difficult?. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’10). ACM, 35–44. doi:10.1145/1753326.1753333
-
[2]
Nicholas J Belkin. 1980. Anomalous States of Knowledge as A Basis for Informa- tion Retrieval.Canadian journal of information science5, 1 (1980), 133–143
1980
-
[3]
Nicholas J. Belkin. 2008. Some(what) Grand Challenges for Information Retrieval. SIGIR Forum42, 1 (June 2008), 47–54. doi:10.1145/1394251.1394261
-
[4]
Nilavra Bhattacharya and Jacek Gwizdka. 2021. YASBIL: Yet another search behaviour (and) interaction logger. InProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2585– 2589
2021
-
[5]
Robert Capra. 2009. HCI Browser: A Tool for Studying Web Search Behavior. In Proceedings of the Workshop on Understanding the User: Logging and Interpreting User Interactions in Information Search and Retrieval (UIIR ’09). 38–41
2009
-
[6]
Robert Capra. 2010. HCI browser: A tool for studying web search behavior. Proceedings of the American Society for Information Science and Technology47, 1 (2010), 1–2
2010
-
[7]
Robert Capra. 2011. HCI Browser: A Tool for Administration and Data Collection for Studies of Web Information Seeking. InHuman-Computer Interaction. Users and Applications. Lecture Notes in Computer Science, Vol. 6764. Springer, 189–198. doi:10.1007/978-3-642-21708-1_30
-
[8]
Michael J Cole, Chathra Hendahewa, Nicholas J Belkin, and Chirag Shah. 2015. User activity patterns during information search.ACM Transactions on Informa- tion Systems (TOIS)33, 1 (2015), 1–39
2015
-
[9]
Sharad Goel, Jake M Hofman, Sébastien Lahaie, David M Pennock, and Duncan J Watts. 2010. Predicting consumer behavior with Web search.Proceedings of the National academy of sciences107, 41 (2010), 17486–17490
2010
-
[10]
Granka, Thorsten Joachims, and Geri Gay
Laura A. Granka, Thorsten Joachims, and Geri Gay. 2004. Eye-Tracking Analysis of User Behavior in WWW Search. InProceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’04). ACM, 478–479. doi:10.1145/1008992.1009079
-
[11]
Jiaman He. 2026. User Search Behavior and Knowledge Effects in Hybrid Search Environment. InProceedings of the 49th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Melbourne, Australia. doi:10.1145/3805712.3808356
-
[12]
Jiaman He, Zikang Leng, Dana McKay, Johanne R Trippas, and Damiano Spina
-
[13]
InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval
Characterising Topic Familiarity and Query Specificity Using Eye-Tracking Data. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2602–2606
-
[14]
Jiaman He, Marta Micheli, Damiano Spina, Dana McKay, Johanne R Trippas, and Noriko Kando. 2026. Characterizing Personality from Eye-Tracking: The Role of Gaze and Its Absence in Interactive Search Environments. InProceedings of the 2026 Conference on Human Information Interaction and Retrieval. 193–203
2026
-
[15]
Bernard J Jansen. 2006. Search log analysis: What it is, what’s been done, how to do it.Library & information science research28, 3 (2006), 407–432
2006
-
[16]
Bernard J Jansen and Amanda Spink. 2006. How are we searching the World Wide Web? A comparison of nine search engine transaction logs.Information processing & management42, 1 (2006), 248–263
2006
-
[17]
Jansen, Amanda Spink, and Tefko Saracevic
Bernard J. Jansen, Amanda Spink, and Tefko Saracevic. 2000. Real Life, Real Users, and Real Needs: A Study and Analysis of User Queries on the Web.Information Processing & Management36, 2 (2000), 207–227. doi:10.1016/S0306-4573(99)00056- 4
-
[18]
Salim, Falk Scholer, and Damiano Spina
Kaixin Ji, Danula Hettiachchi, Flora D. Salim, Falk Scholer, and Damiano Spina
-
[19]
Characterizing Information Seeking Processes with Multiple Physiological Signals. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(Washington DC, USA)(SIGIR ’24). Association for Computing Machinery, New York, NY, USA, 1006–1017. doi:10. 1145/3626772.3657793
-
[20]
Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay. 2005. Accurately Interpreting Clickthrough Data as Implicit Feedback. InProceedings of the 28th Annual International ACM SIGIR Conference on Re- search and Development in Information Retrieval (SIGIR ’05). ACM, 154–161. doi:10.1145/1076034.1076063
-
[21]
Dmitry Lagun, Chih-Hung Hsieh, Dale Webster, and Vidhya Navalpakkam. 2014. Towards Better Measurement of Attention and Satisfaction in Mobile Search. InProceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR ’14). ACM, 113–122. doi:10.1145/ 2600428.2609631
-
[22]
Jane Li, Scott B. Huffman, and Akihito Tokuda. 2009. Good Abandonment in Mobile and PC Internet Search. InProceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’09). ACM, 43–50. doi:10.1145/1571941.1571951
-
[23]
Yidong Liang, Zhijing Wu, Yuchen He, Fengming Liang, Kexin Liu, and Jiaxin Mao. 2025. A Flexible User Study Platform for Generative Information Retrieval. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’25). ACM, 4066–4070. doi:10.1145/ 3726302.3730140
-
[24]
Jingjing Liu, Chang Liu, and Nicholas J Belkin. 2016. Predicting information searchers’ topic knowledge at different search stages.Journal of the Association for Information Science and Technology67, 11 (2016), 2652–2666
2016
-
[25]
Mengyang Liu, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2019. In- vestigating Cognitive Effects in Session-level Search User Satisfaction. InPro- ceedings of the 25th ACM SIGKDD International Conference on Knowledge Discov- ery & Data Mining, KDD 2019, Anchorage, AK, USA, August 4-8, 2019. 923–931. doi:10.1145/3292500.3330981
-
[26]
Ammar Makhlouf, Yutaka Arakawa, and Yutaka Watanabe. 2022. A Privacy- Aware Browser Extension to Track User Search Behavior for Programming Learn- ing. InMobile and Ubiquitous Systems: Computing, Networking and Services (Mo- biQuitous 2021). Springer, 684–694. doi:10.1007/978-3-030-94822-1_51
-
[27]
Jihed Makhlouf, Yutaka Arakawa, and Ko Watanabe. 2021. A privacy-aware browser extension to track user search behavior for programming course supple- ment. InInternational Conference on Mobile and Ubiquitous Systems: Computing, Networking, and Services. Springer, 783–796
2021
-
[28]
1995.Information Seeking in Electronic Environments
Gary Marchionini. 1995.Information Seeking in Electronic Environments. Cam- bridge university press
1995
-
[29]
David Maxwell and Claudia Hauff. 2021. LogUI: contemporary logging infrastruc- ture for web-based experiments. InEuropean Conference on Information Retrieval. Springer, 525–530
2021
-
[30]
Dan Morris, Meredith Ringel Morris, and Gina Venolia. 2008. SearchBar: a search- centric web history for task resumption and information re-finding. InProceedings of the SIGCHI conference on human factors in computing systems. 1207–1216
2008
-
[31]
Srishti Palani and Steven P. Dow. 2025. Contextualizing the Role of Web Search in Creative Workflows: Insights from a Longitudinal Study. InProceedings of the 2025 ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR ’25). ACM
2025
-
[32]
Pernilla Qvarfordt, Simon Tretter, Gene Golovchinsky, and Tony Dunnigan
-
[33]
InProceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
Searchpanel: framing complex search needs. InProceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. 495–504
-
[34]
Craig Silverstein, Monika Henzinger, Hannes Marais, and Michael Moricz. 1999. Analysis of a Very Large Web Search Engine Query Log.SIGIR Forum33, 1 (1999), 6–12
1999
-
[35]
Georg Singer, Ulrich Norbisrath, Eero Vainikko, Hannu Kikkas, and Dirk Lewandowski. 2011. Search-logger analyzing exploratory search tasks. InPro- ceedings of the 2011 ACM Symposium on Applied Computing. 751–756
2011
-
[36]
Kelsey Urgo and Jaime Arguello. 2025. Search as learning.Foundations and Trends®in Information Retrieval19, 4 (2025), 365–556
2025
-
[37]
Tung Vuong, Miamaria Saastamoinen, Giulio Jacucci, and Tuukka Ruotsalo. 2019. Understanding User Behavior in Naturalistic Information Search Tasks.Journal of the Association for Information Science and Technology70, 11 (2019), 1248–1261. doi:10.1002/asi.24201
-
[38]
Xuanhui Wang and ChengXiang Zhai. 2007. Learn from web search logs to organize search results. InProceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 87–94
2007
-
[39]
Austin R Ward and Robert Capra. 2021. OrgBox: Supporting cognitive and metacognitive activities during exploratory search. InProceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 2570–2574
2021
-
[40]
Claire Wardle, Shaydanay Urbani, and Eric Wang. 2025. Evolving Health Information–Seeking Behavior in the Context of Google AI Overviews, ChatGPT, and Alexa: Interview Study Using the Think-Aloud Protocol.Journal of Medical Internet Research27 (2025), e79961
2025
-
[41]
Zhijing Wu, Mark Sanderson, B. Barla Cambazoglu, W. Bruce Croft, and Falk Scholer. 2020. Providing Direct Answers in Search Results: A Study of User Behavior. InProceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM ’20). ACM, 1635–1644. doi:10.1145/3340531. 3412017
-
[42]
Junhao Zhang and Haiming Liu. 2025. Theory-Based User Search Behaviour Modelling and Understanding through Search Log Analysis. InProceedings of the 2025 ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR ’25). ACM. doi:10.1145/3698204.3716462
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.