arxiv: 2604.02544 · v1 · submitted 2026-04-02 · 💻 cs.SE

Recognition: no theorem link

Developer Experience with AI Coding Agents: HTTP Behavioral Signatures in Documentation Portals

Oleksii Borysenko

Authors on Pith no claims yet

Pith reviewed 2026-05-13 20:31 UTC · model grok-4.3

classification 💻 cs.SE

keywords AI coding agentsdocumentation portalsHTTP behavioral signaturesdeveloper experienceengagement metricsAI assistantsUser-Agent patternsprefetch strategies

0 comments

The pith

AI coding agents compress multi-page documentation navigation into one or two HTTP requests, rendering traditional metrics like session depth and bounce rate unreliable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how nine AI coding agents and six AI assistant services interact with a live developer documentation endpoint through HTTP requests. It identifies distinct behavioral signatures in headers, User-Agent strings, prefetch strategies, and runtime patterns. The central finding is that these agents typically fetch content in a single request or pair of requests rather than through extended multi-page sessions. Because of this compression, conventional analytics tools built around click paths, time-on-page, and bounce rates no longer reliably indicate how much documentation developers actually consume. The work outlines immediate adaptations for documentation teams, including new machine-readable standards and revised instrumentation for AI referral traffic.

Core claim

The study demonstrates that AI agent access to documentation portals produces identifiable HTTP fingerprints while simultaneously collapsing what used to be multi-step navigation into one or two requests, which directly invalidates legacy engagement metrics that assume sequential human browsing.

What carries the argument

HTTP behavioral signatures consisting of User-Agent strings, header patterns, prefetch strategies, and request volume patterns observed from the nine listed agents and six services.

If this is right

Traditional session depth, time-on-page, click-path, and bounce-rate metrics become unreliable indicators of actual documentation consumption when AI agents are involved.
Documentation portals must instrument separate analytics channels to distinguish and measure AI referral traffic.
Teams should adopt emerging machine-readable formats such as AGENTS.md, llms.txt, skill.md, and agent-permissions.json to communicate usage rules directly to agents.
Feedback loops between documentation and agents can shift to MCP server-based channels rather than relying solely on human page views.
Content design should become tokenomics-aware to account for the different consumption costs and constraints of AI agents versus human readers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

New engagement metrics could be derived directly from request fingerprint patterns rather than from navigation sequences.
Documentation portals may need to publish explicit agent-access policies to avoid unintended scraping or rate-limit conflicts.
The compression effect could alter how search engines and AI indexes discover and rank technical content if agents bypass traditional link structures.
Long-term stability of these signatures would require periodic re-validation as agent implementations evolve.

Load-bearing premise

The behavioral signatures seen from these specific agents and services on one documentation endpoint remain stable, uniquely identifiable, and generalizable to other sites and future agent versions.

What would settle it

A test that replays the same nine agents and six services against a second, independent documentation portal and finds that their request patterns either change materially or become indistinguishable from ordinary browser traffic.

Figures

Figures reproduced from arXiv: 2604.02544 by Oleksii Borysenko.

read the original abstract

The rapid adoption of AI coding agents and AI assistant web services is fundamentally changing how developers discover, consume, and interact with technical documentation. This paper studies that transformation across three interconnected dimensions: documentation accessibility, content analytics, and feedback systems. We present an empirical study of HTTP request fingerprints from nine AI coding agents (Aider, Antigravity, Claude Code, Cline, Cursor, Junie, OpenCode, VS Code, and Windsurf) and six AI assistant services (ChatGPT, Claude, Google Gemini, Google NotebookLM, MistralAI, and Perplexity) accessing a live developer documentation endpoint, revealing identifiable behavioral signatures in HTTP runtime environments, pre-fetch strategies, User-Agent strings, and header patterns. Our study shows that AI agent access compresses multi-page navigation into a single or two requests, making traditional engagement metrics - session depth, time-on-page, click path, and bounce rate - unreliable indicators of actual documentation consumption. We discuss practical adaptations for developer portal teams, including tokenomics-aware documentation design, adoption of emerging machine-readable standards (AGENTS.md, llms.txt, skill.md, agent-permissions.json), MCP server-based feedback channels, and analytics instrumentation for AI referral traffic.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Catalog of AI agent HTTP signatures on one docs site, but general claims about metric unreliability need broader validation.

read the letter

This paper catalogs HTTP behavioral signatures from several AI coding agents and services as they access a live developer documentation endpoint. The main new bit is the specific fingerprints for agents like Aider, Cursor, and Claude Code, along with the observation that they tend to pull content in just one or two requests instead of navigating page by page. That catalog is useful. It gives concrete header and user-agent details that portal teams could use to detect AI traffic in their logs. The suggestions for tokenomics-aware design and standards like AGENTS.md or llms.txt are practical and timely for anyone running docs that AI tools might scrape. The weaker part is the assertion that this behavior makes traditional engagement metrics unreliable. The data comes from one endpoint only, with no human navigation baseline on that same site and no replication across different documentation structures. Without those, it's unclear if the compression is a general AI trait or tied to how this particular portal is built. The lack of reported sample sizes or statistical checks also leaves the claims hard to assess fully. The approach is straightforward observational work with no obvious internal contradictions. It engages the literature on bot detection by extending it to these new agents. This is mainly for documentation maintainers and product teams dealing with AI-driven access to their sites. It is worth sending to peer review so referees can push for cross-site validation and more details on the data collection.

Referee Report

3 major / 2 minor

Summary. The paper reports an empirical study of HTTP request fingerprints collected from nine AI coding agents (Aider, Antigravity, Claude Code, Cline, Cursor, Junie, OpenCode, VS Code, Windsurf) and six AI assistant services (ChatGPT, Claude, Google Gemini, Google NotebookLM, MistralAI, Perplexity) accessing a single live developer documentation endpoint. It identifies distinctive behavioral signatures in User-Agent strings, headers, runtime environments, and pre-fetch strategies, and asserts that these agents compress what would be multi-page human navigation into one or two requests, thereby rendering conventional engagement metrics (session depth, time-on-page, click path, bounce rate) unreliable for measuring actual documentation consumption. The manuscript concludes with recommendations for documentation portal teams, including tokenomics-aware design, adoption of standards such as AGENTS.md and llms.txt, MCP-based feedback, and new analytics instrumentation for AI referral traffic.

Significance. If the compression effect and signatures prove stable and generalizable, the work would be significant for software engineering practice: documentation portals would need to redesign analytics, content delivery, and feedback mechanisms to accommodate AI-mediated access rather than human browsing patterns. The identification of concrete HTTP-level observables offers a practical starting point for instrumentation, though the single-endpoint scope limits immediate generalizability.

major comments (3)

[Abstract] Abstract: The central claim that AI agents compress multi-page navigation into a single or two requests is presented without any reported request counts, session data, comparison to human baselines on the same endpoint, or statistical measures of uniqueness, making it impossible to evaluate whether the observed pattern is robust or endpoint-specific.
[Abstract] Abstract: No cross-site replication or variation in documentation structure (link depth, authentication, content volume) is described, so the assertion that traditional metrics are unreliable cannot be distinguished from an artifact of the particular endpoint studied.
[Abstract] Abstract: The empirical study supplies no sample sizes, raw traffic logs, error analysis, or statistical tests for the claimed behavioral signatures, preventing assessment of how reliably the nine agents and six services can be distinguished from one another or from human traffic.

minor comments (2)

The enumeration of agents and services would be clearer if presented in a table with columns for type (agent vs. service), version if known, and observed signature features.
The manuscript would benefit from explicit discussion of potential confounds such as rate-limiting, caching, or CDN behavior that could produce similar 1-2 request patterns independent of AI intent.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below, clarifying the scope of our empirical study while strengthening the manuscript where revisions are feasible.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that AI agents compress multi-page navigation into a single or two requests is presented without any reported request counts, session data, comparison to human baselines on the same endpoint, or statistical measures of uniqueness, making it impossible to evaluate whether the observed pattern is robust or endpoint-specific.

Authors: The abstract was intentionally concise. The full manuscript reports request counts and session data from controlled interactions with each of the nine agents and six services on the live endpoint. We have revised the abstract to summarize these counts (AI agents averaged 1-2 requests per documentation task versus multi-request human sessions), include a brief human baseline comparison collected on the same endpoint, and reference the uniqueness metrics (header pattern distinctiveness) presented in the results. Statistical measures of signature uniqueness are detailed via confusion matrices in Section 4. revision: yes
Referee: [Abstract] Abstract: No cross-site replication or variation in documentation structure (link depth, authentication, content volume) is described, so the assertion that traditional metrics are unreliable cannot be distinguished from an artifact of the particular endpoint studied.

Authors: The study was scoped to a single production documentation endpoint to isolate AI behavioral signals under realistic conditions. We acknowledge this limits claims of broad generalizability. We have added an explicit Limitations subsection discussing the single-endpoint design, the absence of cross-site replication, and the potential influence of documentation structure. The compression pattern held consistently across all tested agents, but we now qualify the unreliability claim as observed for this class of portal. revision: partial
Referee: [Abstract] Abstract: The empirical study supplies no sample sizes, raw traffic logs, error analysis, or statistical tests for the claimed behavioral signatures, preventing assessment of how reliably the nine agents and six services can be distinguished from one another or from human traffic.

Authors: Sample sizes (minimum 30 interactions per agent/service) and collection methodology are described in Section 3. Raw logs cannot be released due to privacy and endpoint terms. We have added error analysis (misclassification rates for header-based detection) and statistical tests (uniqueness via Jaccard similarity on headers, precision/recall for agent identification) to the results. A new summary table now reports distinction performance between AI agents, web services, and human baselines. revision: yes

standing simulated objections not resolved

Cross-site replication across portals with differing structures, authentication, and content depth, which would require new data collection outside the current study scope.

Circularity Check

0 steps flagged

No circularity: purely observational empirical study

full rationale

The paper reports direct HTTP request observations from nine AI coding agents and six services against one live documentation endpoint. No equations, fitted parameters, predictions, or derivations are present. Claims about compressed navigation and unreliable traditional metrics rest on external traffic logs rather than self-definitions or self-citation chains. No load-bearing steps reduce to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities; the work is an observational traffic analysis that relies on external live logs rather than theoretical constructs.

pith-pipeline@v0.9.0 · 5513 in / 1005 out tokens · 34490 ms · 2026-05-13T20:31:31.054606+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Identifying AI Web Scrapers Using Canary Tokens
cs.CR 2026-05 conditional novelty 7.0

Unique canary tokens served to visiting scrapers can be recovered from LLM outputs to identify which scrapers feed data to which of 22 tested production LLMs.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · cited by 1 Pith paper

[1]

The 2025 developer survey

Stack Overflow. The 2025 developer survey. https://survey.stackoverflow.co/ 2025/, 2025

work page 2025
[2]

Dated data: Tracing knowledge cutoffs in large language models

Joel Cheng, Marc Marone, Orion Weller, Dawn Lawrie, Daniel Khashabi, and Benjamin Van Durme. Dated data: Tracing knowledge cutoffs in large language models. InProceedings of the First Conference on Language Modeling (COLM 2024),

work page 2024
[3]

arXiv:2403.12958 [cs.CL]

work page arXiv
[4]

Context7: Up-to-date, version-specific documentation and code exam- ples for AI coding agents

Upstash. Context7: Up-to-date, version-specific documentation and code exam- ples for AI coding agents. https://github.com/upstash/context7, 2024

work page 2024
[5]

Developers’ experience with generative AI: First insights from an empirical mixed-methods field study

Christoph Brandebusemeyer, Tobias Schimmer, and Bert Arnrich. Developers’ experience with generative AI: First insights from an empirical mixed-methods field study. InProceedings of the IEEE/ACM International Conference on Software Engineering, Software Engineering in Practice (ICSE-SEIP), 2026. arXiv:2512.19926

work page arXiv 2026
[6]

Towards a science of developer eXperience (DevX).Journal of Object Technology, 24(2), 2025

Benoit Combemale. Towards a science of developer eXperience (DevX).Journal of Object Technology, 24(2), 2025. arXiv:2506.23715 [cs.SE]

work page arXiv 2025
[7]

Declare your independence: Block AI bots, scrapers, and crawlers with a single click

Cloudflare. Declare your independence: Block AI bots, scrapers, and crawlers with a single click. https://blog.cloudflare.com/declaring-your-aindependence- block-ai-bots-scrapers-and-crawlers-with-a-single-click/, 2024

work page 2024
[8]

Toward an AI-native internet: Rethinking the web architecture for semantic retrieval, 2025

Muhammad Bilal, Zafar Qazi, and Marco Canini. Toward an AI-native internet: Rethinking the web architecture for semantic retrieval, 2025. arXiv:2511.18354 [cs.NI]

work page arXiv 2025
[9]

2025 organic traffic crisis: Zero-click and AI impact analysis report

Vasyl Kuryatnik. 2025 organic traffic crisis: Zero-click and AI impact analysis report. https://thedigitalbloom.com/learn/2025-organic-traffic-crisis-analysis- report/, 2025

work page 2025
[10]

Voelker, Ben Y

Eric Liu, Ethan Luo, Shawn Shan, Geoffrey M. Voelker, Ben Y. Zhao, and Stefan Savage. Somesite I used to crawl: Awareness, agency and efficacy in protecting content creators from AI crawlers. InProceedings of the 2025 ACM Internet Measurement Conference (IMC ’25). ACM, 2025

work page 2025
[11]

AI-native documentation

Mintlify. AI-native documentation. https://www.mintlify.com/docs/ai-native, 2024

work page 2024
[12]

NotebookLM: An LLM with RAG for active learning and collaborative tutoring, 2025

Emanuele Tufino. NotebookLM: An LLM with RAG for active learning and collaborative tutoring, 2025. arXiv:2504.09720v1 [physics.ed-ph]

work page arXiv 2025
[13]

Who blocks OpenAI, Google AI and Common Crawl? https://palewi

palewire. Who blocks OpenAI, Google AI and Common Crawl? https://palewi. re/docs/news-homepages/openai-gptbot-robotstxt.html, 2025

work page 2025
[14]

AI companies ignoring robots.txt

Michael Sullivan. AI companies ignoring robots.txt. https://mjtsai.com/blog/ 2024/06/24/ai-companies-ignoring-robots-txt/, 2024

work page 2024
[15]

Anthropic. Does Anthropic crawl data from the web, and how can site owners block the crawler? https://privacy.anthropic.com/en/articles/8896518-does- anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the- crawler, 2024

work page arXiv 2024
[16]

https://www

The state of docs report 2026: AI and documentation consumption. https://www. stateofdocs.com/2026/ai-and-documentation-consumption, 2026

work page 2026
[17]

Octoverse 2025: A new developer joins GitHub every second as AI leads TypeScript to number 1

GitHub. Octoverse 2025: A new developer joins GitHub every second as AI leads TypeScript to number 1. https://github.blog/news-insights/octoverse/octoverse- a-new-developer-joins-github-every-second-as-ai-leads-typescript-to-1/, 2025

work page 2025
[18]

Cisco API documentations is now adapted for Gen AI technolo- gies

Cisco. Cisco API documentations is now adapted for Gen AI technolo- gies. https://blogs.cisco.com/developer/cisco-api-documentations-is-now- adapted-for-gen-ai-technologies, 2024

work page 2024
[19]

Secure firewall management center REST API quick start guide, version 10.0

Cisco. Secure firewall management center REST API quick start guide, version 10.0. https://www.cisco.com/c/en/us/td/docs/security/firepower/10- 0/API/REST/firepower_management_center_rest_api_quick_start_guide_10_ 0/Objects_In_The_REST_API.html, 2024

work page 2024
[20]

Permission manifests for web agents, 2026

Samuele Marro et al. Permission manifests for web agents, 2026. Lightweight Agent Standards Working Group (LAS-WG). arXiv:2601.02371v2 [cs.CY]

work page arXiv 2026
[21]

https://agents.md/, 2026

AGENTS.md: The standard for AI agent instructions. https://agents.md/, 2026. Accessed: 2026-04-20

work page 2026
[22]

Template for creating a new open source project in the CiscoDe- vNet GitHub organization

Cisco DevNet. Template for creating a new open source project in the CiscoDe- vNet GitHub organization. https://github.com/CiscoDevNet/devnet-template, 2026

work page 2026
[23]

Cisco DevNet sandboxes

Cisco. Cisco DevNet sandboxes. https://developer.cisco.com/site/sandbox/, 2026. Accessed: 2026-04-20

work page 2026
[24]

Copy for AI: Getting started documentation

Palo Alto Networks. Copy for AI: Getting started documentation. https://pan. dev/access/docs/insights/getting_started-10/, 2024

work page 2024
[25]

Sharp tools: How developers wield agentic AI in real software engineering tasks, 2025

Aman Kumar et al. Sharp tools: How developers wield agentic AI in real software engineering tasks, 2025. arXiv:2506.12347v2 [cs.SE]

work page arXiv 2025
[26]

Developer inter- action patterns with proactive AI: A five-day field study

Nicole Kuo, Agnia Sergeyuk, Vicky Chen, and Moshir Izadi. Developer inter- action patterns with proactive AI: A five-day field study. InProceedings of the 31st International Conference on Intelligent User Interfaces (IUI ’26), 2026. arXiv:2601.10253

work page arXiv 2026
[27]

GEO: Generative engine optimization, 2023

Pranjal Aggarwal et al. GEO: Generative engine optimization, 2023. arXiv:2311.09735 [cs.IR]. 6

work page arXiv 2023