pith. sign in

arxiv: 2606.20910 · v1 · pith:57X4NAHGnew · submitted 2026-06-18 · 💻 cs.CR · cs.AI

Whose Agent Are You? Multi-Layer Fingerprinting and Attribution of Autonomous Web Agents

Pith reviewed 2026-06-26 16:33 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords web agentsfingerprintingAI agentsattributionweb securitybrowser behaviornetwork fingerprintingdecision tree
0
0 comments X

The pith

Multi-layer fingerprints based on network and browser behavior distinguish AI web agents from humans and legacy crawlers at 97 percent accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that autonomous AI web agents, which pair large language models with browser-level control, produce detectable structural patterns in their TLS and HTTP connections plus their browser interaction sequences. These patterns differ enough across agent frameworks to support reliable attribution, which matters because existing defenses such as robots.txt are routinely ignored and traditional bot detection can be evaded. By logging and classifying traffic from six major agent systems, the authors show that a decision tree model can isolate individual agent architectures while separating them from both human browsing and older crawlers. The approach relies on passive, cross-layer observations rather than active blocking, offering a potential route to enforce content access policies on instrumented domains.

Core claim

The authors demonstrate that AI web agents can be effectively distinguished from humans and traditional crawlers using a multi-layer fingerprint based on both network layer characteristics (e.g., TLS, HTTP) and browser interaction behavior. By analyzing six prominent agent frameworks, they uncover latent structural differences in how these systems assemble HTTP requests, establish TLS/HTTP connections, and execute autonomous browser actions. Feeding these multi-layer features into a decision tree classifier achieves 97 percent accuracy in identification, isolating distinct agent architectures.

What carries the argument

Multi-layer fingerprint consisting of network-layer details (TLS and HTTP characteristics) combined with browser interaction behavior, processed through a decision tree classifier.

If this is right

  • Web servers can deploy the logging framework on live domains to attribute incoming agent traffic to specific frameworks.
  • Content owners gain an evasion-resistant method to enforce access policies against automated scraping.
  • Different agent architectures become distinguishable even when they share similar high-level goals.
  • Agent traffic can be separated from human browsing baselines without relying on user-agent strings or robots.txt compliance.
  • Legacy crawler detection improves when multi-layer signals supplement existing heuristics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Operators of AI agents may need to introduce deliberate randomization in request patterns to reduce identifiability.
  • The same cross-layer approach could be applied to other autonomous systems that control browsers or make network calls.
  • Widespread deployment might shift the arms race toward agents that actively mimic human timing and connection behavior.
  • Attribution data collected this way could inform policy debates on regulating large-scale automated web access.

Load-bearing premise

The observed structural differences across the six tested agent frameworks stay consistent enough to serve as stable identifiers even if the agents are updated or reconfigured.

What would settle it

Retraining the classifier on traffic from modified or newly released versions of the same agent frameworks and measuring whether accuracy falls below 80 percent would test whether the fingerprints remain reliable.

Figures

Figures reproduced from arXiv: 2606.20910 by Amir Houmansadr, Dayeon Kang, Hyejun Jeong, Jade Sheffey, Pubali Datta.

Figure 1
Figure 1. Figure 1: Overview of MARK. Our Multi-layer Agent fingerprinting framewoRK (MARK) consists of four stages: (1) configuring agents with a common web-interaction instruction, (2) collecting raw network and client-side traces as each agent visits URLs and performs controlled UX tasks, (3) extracting TLS, HTTP, and behavioral features from the segmented traces, and (4) using the resulting feature vectors to attribute tr… view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of request timing (Inter-Request￾Intervals and Inter-Event-Intervals). A similar tendency between IRI and IEI indicates the pacing strategies of agents. capture consistent behavioral patterns in page inspection, action selection, timing, and interaction order. We use these repeated traces to support more generalizable comparisons across agents, even for consistent and systemic browsing actions o… view at source ↗
Figure 3
Figure 3. Figure 3: Comparison results of the request rate sent by the agent and its Coefficient of Variation. This feature shows the stability and pacing of each agent. between Claude and Skyvern, which are considered vision￾based agents, implies that the agent type does not strongly determine IRI behavior. We also analyze Inter-Event In￾tervals (IEI), derived from the timing of web component interactions, as illustrated in … view at source ↗
Figure 4
Figure 4. Figure 4: Mean and standard deviation values of mouse trajectory length. It implies agent type and strategy for human-mimicking behavior. Assuming that different types of agents affect decision￾making and web-browsing strategies, and that these dif￾ferences would be reflected in web component interaction behavior, we analyze behavioral features from short web component interaction logs at the website end. Interest￾i… view at source ↗
Figure 5
Figure 5. Figure 5: Web component interaction proportion for each web agent on different pages. The event profile proportion suggests a web content exploration strategy of agents, and it is different for agents even though the network packet structures are identical. (S5 webpage is skipped because Claude, Gemini, and Skyvern have no record for it.) 0 20 40 60 80 100 120 Keydowns/Session 0 200 400 600 800 1000 1200 1400 Mouse … view at source ↗
Figure 6
Figure 6. Figure 6: The key downs and mouse movements per session clustered regions across agents. Agents generally use keyboard controls rather than mouse movements, except for Skyvern, which presents human-like behavior. Finding 7. Skyvern exhibits a uniquely human￾like interaction signature. As a vision-based agent, Skyvern generates substantially longer mouse trajec￾tories and the highest mouse-move activity across pages,… view at source ↗
Figure 7
Figure 7. Figure 7: Agent identification performance comparison as the number of requests seen at the website end. Protocol-level fingerprints preserve more than 60% agent discrimination after 3 requests are received, and behavioral replenishes the performance after signals are accumulated. trial results reported in [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Task instruction for autonomous web interaction. It guides the agent to sequentially visit each target URL, inspect the page, perform one natural interaction with synthetic inputs, and immediately proceed to the next page [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Scenario 1, version 1 [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 13
Figure 13. Figure 13: Scenario 3 [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Scenario 4 [PITH_FULL_IMAGE:figures/full_fig_p017_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Scenario 5 [PITH_FULL_IMAGE:figures/full_fig_p018_15.png] view at source ↗
read the original abstract

As AI web agents proliferate, combining large language models with autonomous, browser-level control, indiscriminate content scraping by web agents has emerged as a privacy and security challenge. Existing defenses, such as robots.txt and active bot-blocking, are insufficient, as they are widely violated and easily circumvented. In this work, we demonstrate that AI web agents can be effectively distinguished from humans and traditional crawlers using a multi-layer fingerprint based on both network layer characteristics (e.g., TLS, HTTP) and browser interaction behavior. We implement this mechanism as a programmatic logging framework that can be deployed on a live, instrumented domain. By analyzing six prominent agent frameworks (AutoGen, Browser Use, Claude, Gemini, Operator, and Skyvern), we uncover latent structural differences in how these systems assemble HTTP requests, establish TLS/HTTP connections, and execute autonomous browser actions. Feeding these multi-layer features into a decision tree classifier, our framework achieves high-fidelity identification (97% accuracy), successfully isolating distinct agent architectures and differentiating agent traffic from both human browsing baselines and legacy crawlers. Our findings demonstrate that cross-layer agent tracking provides a robust, evasion-resistant strategy for content protection and web security policy enforcement.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces a multi-layer fingerprinting framework for identifying autonomous AI web agents. It collects features from TLS/HTTP request assembly and browser action sequences across six agent frameworks (AutoGen, Browser Use, Claude, Gemini, Operator, Skyvern), along with human and legacy crawler baselines. These features are fed into a decision tree classifier, which the authors report achieves 97% accuracy in attributing agent traffic and distinguishing it from human browsing and traditional crawlers. The approach is implemented as a logging framework for live domains and positioned as an evasion-resistant defense against indiscriminate scraping.

Significance. If the central performance and stability claims hold after additional validation, the work would supply web operators with a deployable, cross-layer attribution method that addresses limitations of robots.txt and simple bot blockers. The identification of persistent structural differences in agent HTTP/TLS and interaction patterns could support more targeted security policies as LLM-based agents become common.

major comments (3)
  1. [Abstract and Evaluation] Abstract and Evaluation section: The claim of 97% accuracy on six frameworks is presented without any information on dataset size, trace collection methodology, cross-validation procedure, error bars, or statistical significance. This absence makes it impossible to evaluate whether the decision tree result supports the attribution and evasion-resistance conclusions.
  2. [Experimental Evaluation] Experimental Evaluation: All traces are collected from unmodified default instances of the six frameworks. No tests alter configuration parameters, inject custom headers, change navigation policies, or evaluate updated framework releases. Because the central claim requires that the observed multi-layer differences remain reliable identifiers under reconfiguration, the current results do not substantiate the 'evasion-resistant' assertion.
  3. [Methodology] Methodology: The manuscript does not enumerate the precise multi-layer feature set, the extraction code, or any feature-importance ranking from the decision tree. Without these details the reported accuracy cannot be reproduced or attributed to specific layers (TLS vs. browser actions).
minor comments (2)
  1. [Abstract] The abstract lists framework names with inconsistent formatting (e.g., 'Browser Use' versus single-word names); standardize naming throughout.
  2. [Implementation] No reference is given to the programmatic logging framework implementation or to any public artifact that would allow independent verification of the feature collection pipeline.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We agree that additional details on the experimental methodology, dataset, and features are needed to support the claims, and we will revise the manuscript to address these points. Below we respond to each major comment.

read point-by-point responses
  1. Referee: [Abstract and Evaluation] Abstract and Evaluation section: The claim of 97% accuracy on six frameworks is presented without any information on dataset size, trace collection methodology, cross-validation procedure, error bars, or statistical significance. This absence makes it impossible to evaluate whether the decision tree result supports the attribution and evasion-resistance conclusions.

    Authors: We agree that the current presentation lacks sufficient experimental details for proper evaluation. In the revised manuscript we will expand the Evaluation section (and update the abstract if space permits) to report the total number of traces collected per framework and baseline, the precise trace collection methodology and environment, the cross-validation procedure (including number of folds), accuracy with error bars or confidence intervals, and any statistical significance tests performed on the 97% result. revision: yes

  2. Referee: [Experimental Evaluation] Experimental Evaluation: All traces are collected from unmodified default instances of the six frameworks. No tests alter configuration parameters, inject custom headers, change navigation policies, or evaluate updated framework releases. Because the central claim requires that the observed multi-layer differences remain reliable identifiers under reconfiguration, the current results do not substantiate the 'evasion-resistant' assertion.

    Authors: The referee is correct that the experiments used only default configurations. While the multi-layer differences we observed arise from fundamental architectural choices in request assembly and browser control (which are not trivially reconfigurable without breaking core agent functionality), we acknowledge that the evasion-resistance claim would be stronger with explicit tests of modified settings. In revision we will add a dedicated limitations subsection discussing potential evasion vectors and, to the extent feasible with the existing trace collection infrastructure, include preliminary results on a small set of reconfigured instances. We will also tone down the 'evasion-resistant' phrasing to 'resistant to simple evasion under standard usage' where appropriate. revision: partial

  3. Referee: [Methodology] Methodology: The manuscript does not enumerate the precise multi-layer feature set, the extraction code, or any feature-importance ranking from the decision tree. Without these details the reported accuracy cannot be reproduced or attributed to specific layers (TLS vs. browser actions).

    Authors: We agree that the lack of feature-level detail hinders reproducibility. In the revised version we will add an explicit table or enumerated list of all multi-layer features (grouped by TLS handshake, HTTP request construction, and browser action sequences), describe the extraction logic in sufficient detail for reproduction, and report the feature-importance scores from the trained decision tree so readers can see the relative contribution of each layer. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical classification on observed features

full rationale

The paper reports an empirical ML result: multi-layer features (TLS/HTTP assembly, browser actions) are extracted from traces of six agent frameworks plus baselines, then fed to a decision tree yielding 97% accuracy. No equations, no fitted parameters renamed as predictions, no self-citations invoked for uniqueness theorems or ansatzes, and no reduction of the central claim to its own inputs by construction. The accuracy is a direct performance metric on the collected data distribution; the derivation chain is self-contained standard supervised learning.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Empirical ML classification study; relies on standard assumptions about feature stability and data representativeness rather than new theoretical constructs.

free parameters (1)
  • decision tree hyperparameters
    Parameters of the classifier are fitted to the collected agent and baseline traffic data.
axioms (1)
  • domain assumption Observed differences in TLS/HTTP assembly and browser actions are stable identifiers for the tested agent frameworks
    Invoked when claiming the features enable high-accuracy classification.

pith-pipeline@v0.9.1-grok · 5756 in / 1172 out tokens · 37073 ms · 2026-06-26T16:33:03.229852+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

81 extracted references · 3 canonical work pages

  1. [1]

    150+ ai agents statistics: What business leaders are betting on in 2026,

    I. Pohrebniyak, “150+ ai agents statistics: What business leaders are betting on in 2026,” https://masterofcode.com/blog/ai-agent-statistics, 2026, [Accessed 02-05-2026]

  2. [2]

    Who’s adopting ai agents—and what they’re actually doing with them,

    J. Yang, “Who’s adopting ai agents—and what they’re actually doing with them,” https://www.library.hbs.edu/working-knowledge/whos-a dopting-ai-agents-and-what-theyre-actually-doing-with-them, 2026, [Accessed 12-06-2026]

  3. [3]

    Robots Exclusion Protocol,

    M. Koster, G. Illyes, H. Zeller, and L. Sassman, “Robots Exclusion Protocol,” RFC 9309, Sep. 2022. [Online]. Available: https://www.rfc-editor.org/info/rfc9309

  4. [4]

    The odyssey of robots.txt governance: Measuring convention implications of web bots in Large Language Model services,

    J. Cui, M. Zha, X. Wang, and X. Liao, “The odyssey of robots.txt governance: Measuring convention implications of web bots in Large Language Model services,” inProceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 21–35. [Online]. Available: ht...

  5. [5]

    Scrapers selectively respect robots.txt directives: Evidence from a large-scale empirical study,

    T. Kim, K. Bock, C. Luo, A. Liswood, C. Poroslay, and E. Wenger, “Scrapers selectively respect robots.txt directives: Evidence from a large-scale empirical study,” inProceedings of the 2025 ACM Internet Measurement Conference, ser. IMC ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 541–557. [Online]. Available: https://doi.org/10.1...

  6. [6]

    Somesite i used to crawl: Awareness, agency and efficacy in protecting content creators from AI crawlers,

    E. Liu, E. Luo, S. Shan, G. M. V oelker, B. Y . Zhao, and S. Savage, “Somesite i used to crawl: Awareness, agency and efficacy in protecting content creators from AI crawlers,” inProceedings of the 2025 ACM Internet Measurement Conference, ser. IMC ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 78–99. [Online]. Available: https://d...

  7. [7]

    Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives — blog.cloudflare.com,

    G. Corral, V . Singhal, B. Mitchell, and R. Tatoris, “Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives — blog.cloudflare.com,” https://blog.cloudflare.com/perplexity-is-using -stealth-undeclared-crawlers-to-evade-website-no-crawl-directives/, 2025, [Accessed 21-04-2026]

  8. [8]

    Trapping misbehaving bots in an AI Labyrinth — blog.cloudflare.com,

    R. Tatoris, H. Saxena, and L. Miglietti, “Trapping misbehaving bots in an AI Labyrinth — blog.cloudflare.com,” https://blog.cloudflare.c om/ai-labyrinth/, 2025, [Accessed 02-05-2026]

  9. [9]

    Statistical identification of encrypted web browsing traf- fic,

    Q. Sun, D. R. Simon, Y .-M. Wang, W. Russell, V . N. Padmanabhan, and L. Qiu, “Statistical identification of encrypted web browsing traf- fic,” inProceedings 2002 IEEE Symposium on Security and Privacy. IEEE, 2002, pp. 19–30

  10. [10]

    Website fingerprinting at internet scale

    A. Panchenko, F. Lanze, J. Pennekamp, T. Engel, A. Zinnen, M. Henze, and K. Wehrle, “Website fingerprinting at internet scale.” inNDSS, vol. 1, 2016, p. 23477

  11. [11]

    k-fingerprinting: A robust scalable web- site fingerprinting technique,

    J. Hayes and G. Danezis, “k-fingerprinting: A robust scalable web- site fingerprinting technique,” in25th USENIX Security Symposium (USENIX Security 16), 2016, pp. 1187–1203

  12. [12]

    Fp-stalker: Tracking browser fingerprint evolutions,

    A. Vastel, P. Laperdrix, W. Rudametkin, and R. Rouvoy, “Fp-stalker: Tracking browser fingerprint evolutions,” in2018 IEEE Symposium on Security and Privacy (SP). IEEE, 2018, pp. 728–741

  13. [13]

    How unique is your web browser?

    P. Eckersley, “How unique is your web browser?” inInterna- tional Symposium on Privacy Enhancing Technologies Symposium. Springer, 2010, pp. 1–18

  14. [14]

    Long-term observation on browser fingerprinting: Users’ trackability and per- spective,

    G. Pugliese, C. Riess, F. Gassmann, and Z. Benenson, “Long-term observation on browser fingerprinting: Users’ trackability and per- spective,”Proceedings on Privacy Enhancing Technologies, 2020

  15. [15]

    Tracking users on the internet with behavioral patterns: Evaluation of its practical feasibil- ity,

    C. Banse, D. Herrmann, and H. Federrath, “Tracking users on the internet with behavioral patterns: Evaluation of its practical feasibil- ity,” inIFIP International Information Security Conference. Springer, 2012, pp. 235–248

  16. [16]

    A novel attack to track users based on the behavior patterns,

    X. Gu, M. Yang, C. Shi, Z. Ling, and J. Luo, “A novel attack to track users based on the behavior patterns,”Concurrency and Computation: Practice and Experience, vol. 29, no. 6, p. e3891, 2017

  17. [17]

    Behavior-based track- ing: Exploiting characteristic patterns in dns traffic,

    D. Herrmann, C. Banse, and H. Federrath, “Behavior-based track- ing: Exploiting characteristic patterns in dns traffic,”Computers & Security, vol. 39, pp. 17–33, 2013

  18. [18]

    Web user behavioral profiling for user identification,

    Y . C. Yang, “Web user behavioral profiling for user identification,” Decision Support Systems, vol. 49, no. 3, pp. 261–271, 2010

  19. [19]

    Web page revisitation revisited: implications of a long-term click-stream study of browser usage,

    H. Obendorf, H. Weinreich, E. Herder, and M. Mayer, “Web page revisitation revisited: implications of a long-term click-stream study of browser usage,” inProceedings of the SIGCHI conference on Human factors in computing systems, 2007, pp. 597–606

  20. [20]

    Browsing unicity: On the limits of anonymizing web tracking data,

    C. Deußer, S. Passmann, and T. Strufe, “Browsing unicity: On the limits of anonymizing web tracking data,” in2020 IEEE Symposium on Security and Privacy (SP). IEEE, 2020, pp. 777–790

  21. [21]

    Fingerprint Launches Automation Intelligence API and AI Assistant Detection, Delivering the Industry’s Most Complete View of AI Traffic,

    Fingerprint, “Fingerprint Launches Automation Intelligence API and AI Assistant Detection, Delivering the Industry’s Most Complete View of AI Traffic,” https://www.businesswire.com/news/home/2 0260601158287/en/Fingerprint-Launches-Automation-Intelligenc e-API-and-AI-Assistant-Detection-Delivering-the-Industrys-Most-C omplete-View-of-AI-Traffic, 2026, [Acc...

  22. [22]

    Skyvern: Automate browser-based workflows with AI,

    Skyvern-AI, “Skyvern: Automate browser-based workflows with AI,” https://github.com/Skyvern-AI/skyvern, 2026, accessed: 2026-05-01

  23. [23]

    Browser Use: Enable AI to control your browser,

    M. M ¨uller and G. ˇZuniˇc, “Browser Use: Enable AI to control your browser,” https://github.com/browser-use/browser-use, 2024

  24. [24]

    AutoGen: Enabling next-gen LLM applications via multi- agent conversations,

    Q. Wu, G. Bansal, J. Zhang, Y . Wu, B. Li, E. Zhu, L. Jiang, X. Zhang, S. Zhang, J. Liu, A. H. Awadallah, R. W. White, D. Burger, and C. Wang, “AutoGen: Enabling next-gen LLM applications via multi- agent conversations,” inCOLM, 2024

  25. [25]

    Introducing Operator,

    OpenAI, “Introducing Operator,” https://openai.com/index/introduci ng-operator/, Jan. 2025, accessed: 2026-05-01

  26. [26]

    Computer use tool,

    Anthropic, “Computer use tool,” https://platform.claude.com/docs/e n/agents-and-tools/tool-use/computer-use-tool, 2026, claude API documentation. Accessed: 2026-05-01

  27. [27]

    Computer use,

    Google, “Computer use,” https://ai.google.dev/gemini-api/docs/comp uter-use, 2026, gemini API documentation. Last updated: 2026-04-

  28. [28]

    Accessed: 2026-05-01

  29. [29]

    Apache Nutch,

    A. N. P. M. Committee, “Apache Nutch,” https://nutch.apache.org/, [Accessed 11-06-2026]

  30. [30]

    Heritrix,

    internetarchive/heritrix3, “Heritrix,” https://github.com/internetarchi ve/heritrix3, [Accessed 11-06-2026]

  31. [31]

    Scrapy — open source web scraping framework for Python,

    Scrapy, “Scrapy — open source web scraping framework for Python,” https://www.scrapy.org/, [Accessed 11-06-2026]

  32. [32]

    Breaking agent backbones: Evaluating the security of backbone LLMs in AI agents,

    J. Bazinska, M. Mathys, F. Casucci, M. Rojas-Carulla, X. Davies, A. Souly, and N. Pfister, “Breaking agent backbones: Evaluating the security of backbone LLMs in AI agents,” inThe Fourteenth International Conference on Learning Representations, 2026

  33. [33]

    ReAct: Synergizing reasoning and acting in language models,

    S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “ReAct: Synergizing reasoning and acting in language models,” in ICLR, 2023

  34. [34]

    V oyager: An open-ended embodied agent with Large Language Models,

    G. Wang, Y . Xie, Y . Jiang, A. Mandlekar, C. Xiao, Y . Zhu, L. Fan, and A. Anandkumar, “V oyager: An open-ended embodied agent with Large Language Models,”Transactions on Machine Learning Research, 2024. [Online]. Available: https://openreview.net/forum?i d=ehfRiF0R3a

  35. [35]

    OmniTool: Computer use with OmniParser,

    Microsoft, “OmniTool: Computer use with OmniParser,” https://gith ub.com/microsoft/OmniParser/blob/master/omnitool/readme.md, 2025, accessed: 2026-05-01

  36. [36]

    SWE-agent: Agent-computer interfaces enable automated software engineering,

    J. Yang, C. E. Jimenez, A. Wettig, K. Lieret, S. Yao, K. R. Narasimhan, and O. Press, “SWE-agent: Agent-computer interfaces enable automated software engineering,” inNeurIPS, 2024, pp. 50 528–50 652. [Online]. Available: https://arxiv.org/abs/2405.15793

  37. [37]

    Build software with AI agents,

    Cursor, “Build software with AI agents,” https://cursor.com/product, 2026, accessed: 2026-05-01

  38. [38]

    Accessed: 2026-05-01

    OpenAI, “Codex,” https://developers.openai.com/codex, 2026, openAI Developers documentation. Accessed: 2026-05-01

  39. [39]

    Claude Code overview,

    Anthropic, “Claude Code overview,” https://code.claude.com/docs/en/ overview, 2026, claude Code documentation. Accessed: 2026-05-01

  40. [40]

    Introducing devin, the first AI software engineer,

    Cognition, “Introducing devin, the first AI software engineer,” https: //cognition.ai/blog/introducing-devin, Mar. 2024, accessed: 2026-05- 01

  41. [41]

    GPT researcher,

    A. Elovic, “GPT researcher,” code repository: https://github.com/ass afelovic/gpt-researcher. [Online]. Available: https://gptr.dev

  42. [42]

    Introducing Deep Research,

    OpenAI, “Introducing Deep Research,” https://openai.com/index/int roducing-deep-research/, 2025

  43. [43]

    Gemini Deep Research — your personal research assistant,

    Google, “Gemini Deep Research — your personal research assistant,” https://gemini.google/overview/deep-research/, 2025, accessed: 2025- 07-16

  44. [44]

    Are AI agents interacting with online ads?

    A. St ¨ockl and J. Nitu, “Are AI agents interacting with online ads?” arXiv preprint arXiv:2504.07112, 2025

  45. [45]

    How Skyvern reads and understands the web,

    S. Singh, “How Skyvern reads and understands the web,” https://ww w.skyvern.com/blog/how-skyvern-reads-and-understands-the-web/, Jul. 2025, accessed: 2026-05-01

  46. [46]

    Computer-using agent,

    OpenAI, “Computer-using agent,” https://openai.com/index/compute r-using-agent/, Jan. 2025, accessed: 2026-05-01

  47. [47]

    HTTPS traffic anal- ysis and client identification using passive SSL/TLS fingerprinting,

    M. Hus ´ak, M. ˇCerm´ak, T. Jirs´ık, and P. ˇCeleda, “HTTPS traffic anal- ysis and client identification using passive SSL/TLS fingerprinting,” EURASIP Journal on Information Security, vol. 2016, p. 6, 2016

  48. [48]

    TLS fingerprinting with JA3 and JA3S,

    J. Althouse, J. Atkinson, and J. Atkins, “TLS fingerprinting with JA3 and JA3S,” Salesforce Engineering Blog, Jan. 2019, https://engineer ing.salesforce.com/tls-fingerprinting-with-ja3-and-ja3s-247362855 967/

  49. [49]

    JA4+ network fingerprinting,

    J. Althouse, “JA4+ network fingerprinting,” FoxIO Blog, Sep. 2023, https://foxio.io/blog/ja4-network-fingerprinting

  50. [50]

    The use of TLS in censorship circum- vention,

    S. Frolov and E. Wustrow, “The use of TLS in censorship circum- vention,” inProceedings of the 26th Annual Network and Distributed System Security Symposium (NDSS). Internet Society, 2019

  51. [51]

    TLS beyond the browser: Combin- ing end host and network data to understand application behavior,

    B. Anderson and D. A. McGrew, “TLS beyond the browser: Combin- ing end host and network data to understand application behavior,” inProceedings of the Internet Measurement Conference (IMC ’19). ACM, 2019, pp. 379–392

  52. [52]

    Passive fingerprinting of HTTP/2 clients,

    O. Segal, A. Fridman, and E. Shuster, “Passive fingerprinting of HTTP/2 clients,” Akamai Technologies White Paper, 2017, presented at Black Hat Europe 2017. https://www.blackhat.com/docs/eu-17/ma terials/eu-17-Shuster-Passive-Fingerprinting-Of-HTTP2-Clients-wp. pdf

  53. [53]

    Good bot, bad bot: Characterizing automated browsing activity,

    X. Li, B. Amin Azad, A. Rahmati, and N. Nikiforakis, “Good bot, bad bot: Characterizing automated browsing activity,” in2021 IEEE Symposium on Security and Privacy (SP). IEEE, 2021, pp. 1589– 1605

  54. [54]

    When handshakes tell the truth: Detecting web bad bots via TLS fingerprints,

    G. Jarad and K. Bicakci, “When handshakes tell the truth: Detecting web bad bots via TLS fingerprints,” 2026. [Online]. Available: https://arxiv.org/abs/2602.09606

  55. [55]

    Exposing LLM user privacy via traffic fingerprint analysis: A study of privacy risks in LLM agent interactions,

    Y . Zhang, X. Deng, Z. Gu, Y . Chen, K. Xu, Q. Li, and J. Wu, “Exposing LLM user privacy via traffic fingerprint analysis: A study of privacy risks in LLM agent interactions,” 2025. [Online]. Available: https://arxiv.org/abs/2510.07176

  56. [56]

    Tracked without a trace: linking sessions of users by unsupervised learning of patterns in their dns traffic,

    M. Kirchler, D. Herrmann, J. Lindemann, and M. Kloft, “Tracked without a trace: linking sessions of users by unsupervised learning of patterns in their dns traffic,” inProceedings of the 2016 ACM workshop on artificial intelligence and security, 2016, pp. 23–34

  57. [57]

    Users’ fingerprinting techniques from tcp traffic,

    L. Vassio, D. Giordano, M. Trevisan, M. Mellia, and A. P. C. da Silva, “Users’ fingerprinting techniques from tcp traffic,” inProceedings of the Workshop on Big Data Analytics and Machine Learning for Data Communication Networks, 2017, pp. 49–54

  58. [58]

    Rethinking fingerprinting: An assessment of behavior-based methods at scale and implications for web tracking,

    K. Crichton, L. F. Cranor, and N. Christin, “Rethinking fingerprinting: An assessment of behavior-based methods at scale and implications for web tracking,”Proceedings on Privacy Enhancing Technologies, 2025

  59. [59]

    Fp-agent: Fingerprinting ai browsing agents,

    E. Wang, Z. Shafiq, and Y . Vekaria, “Fp-agent: Fingerprinting ai browsing agents,” 2026. [Online]. Available: https://arxiv.org/abs/26 05.01247

  60. [60]

    About Let’s Encrypt,

    L. Encrypt, “About Let’s Encrypt,” https://letsencrypt.org/about/, 2021, [Accessed 11-06-2026]

  61. [61]

    Emer- gence WebV oyager: Toward consistent and transparent evaluation of (Web) Agents in the wild,

    D. Akkil, M. Allaham, A. Raj, T. Abuelsaad, and R. Kokku, “Emer- gence WebV oyager: Toward consistent and transparent evaluation of (Web) Agents in the wild,”arXiv preprint arXiv:2603.29020, 2026

  62. [62]

    What Is Claude Code Computer Use? How to Control Your Desktop with AI — mindstudio.ai,

    M. Team, “What Is Claude Code Computer Use? How to Control Your Desktop with AI — mindstudio.ai,” https://www.mindstudio.a i/blog/what-is-claude-code-computer-use, 2026, [Accessed 12-06- 2026]

  63. [63]

    How Gemini 2.5 Computer Use Lets AI Control Web Interfaces (Safely and Smartly),

    SiderAI, “How Gemini 2.5 Computer Use Lets AI Control Web Interfaces (Safely and Smartly),” https://sider.ai/blog/ai-tools/how -gemini-2 5-computer-use-lets-ai-control-web-interfaces-safely-and -smartly, 2025, [Accessed 12-06-2026]

  64. [64]

    Introducing Operator,

    OpenAI, “Introducing Operator,” https://openai.com/index/introduci ng-operator/, Jan. 2025, [Accessed 11-06-2026]

  65. [65]

    Openai operator explained: How ai agents actually control the web,

    I. Raman, “Openai operator explained: How ai agents actually control the web,” https://anchorbrowser.io/blog/how-openai-operator-works -with-ai-agents, [Accessed 12-06-2026]

  66. [66]

    A new path for kyber on the web,

    D. Adrian, D. Benjamin, B. Beck, and D. O’Brien, “A new path for kyber on the web,” https://security.googleblog.com/2024/09/a-new-p ath-for-kyber-on-web.html, 2024, [Accessed 12-06-2026]

  67. [67]

    Hypertext Transfer Protocol Version 2 (HTTP/2),

    M. Belshe, R. Peon, and M. Thomson, “Hypertext Transfer Protocol Version 2 (HTTP/2),” RFC 7540, May 2015, obsoleted by RFC 9113. [Accessed 27-05-2026]. [Online]. Available: https: //www.rfc-editor.org/info/rfc7540

  68. [68]

    Extensible Prioritization Scheme for HTTP,

    K. Oku and L. Pardue, “Extensible Prioritization Scheme for HTTP,” RFC 9218, Jun. 2022, [Accessed 27-05-2026]. [Online]. Available: https://www.rfc-editor.org/info/rfc9218 Appendix A. Measurement Setup for Agents Figure 8 shows the task execution prompt for fingerprint measurement of web agents as discussed in Section 4.1. Appendix B. User Behavior Testbe...

  69. [69]

    https://<testbed domain>/subscribe-v1.html

  70. [70]

    https://<testbed domain>/subscribe-v2.html

  71. [71]

    https://<testbed domain>/subscribe-v3.html

  72. [72]

    https://<testbed domain>/s2-scroll-gate.html

  73. [73]

    https://<testbed domain>/s3-hover-reveal.html

  74. [74]

    https://<testbed domain>/s4-dom-mismatch.html

  75. [75]

    https://<testbed domain>/s5-delayed-feedback.html For each target URL:

  76. [76]

    Navigate directly to the exact URL

  77. [77]

    Once the page loads, inspect the page and decide what a reasonable user would naturally do

  78. [78]

    - A sequence may contain multiple low-level browser actions if they naturally belong together

    Perform one short natural interaction sequence. - A sequence may contain multiple low-level browser actions if they naturally belong together. - For example, typing into a field and pressing a nearby submit button may count as one sequence. - Choose synthetic input values yourself. - Do not use real personal information. - Do not ask the user what value to enter

  79. [79]

    Immediately navigate directly to the next target URL

    After you perform any meaningful page interaction sequence, the current URL is considered complete. Immediately navigate directly to the next target URL

  80. [80]

    Instead, navigate directly to the next unvisited target URL

    If you are unsure what to do next, do not ask the user. Instead, navigate directly to the next unvisited target URL

Showing first 80 references.