Disentangling Answer Engine Optimization from Platform Growth: A Log-Based Natural Experiment on ChatGPT Referral Traffic

Kazuki Nakayashiki; Keisuke Watanabe

arxiv: 2606.04362 · v1 · pith:XRAKQUNUnew · submitted 2026-06-03 · 💻 cs.IR · cs.CL

Disentangling Answer Engine Optimization from Platform Growth: A Log-Based Natural Experiment on ChatGPT Referral Traffic

Keisuke Watanabe , Kazuki Nakayashiki This is my paper

Pith reviewed 2026-06-28 04:40 UTC · model grok-4.3

classification 💻 cs.IR cs.CL

keywords answer engine optimizationreferral trafficnatural experimentinterrupted time seriesChatGPTAEOplatform growthon-domain control

0 comments

The pith

Treated pages saw 1.82 times more ChatGPT referrals after AEO once platform growth is removed via on-domain controls.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tracks first-party server logs of ChatGPT referrals to hundreds of thousands of YouTube Q&A pages on one domain. A defined bundle of AEO changes was applied to one subset of pages in January 2026 while the remainder stayed untouched and served as a contemporaneous control. Raw referrals to the whole site rose 5.7 times over the study window, but untreated pages rose 3.5 times, showing that most observed growth traces to the answer engine itself rather than any optimization. An interrupted time-series model on the weekly treated-to-control ratio detects a discrete level jump of 1.82 times aligned with the intervention date. A placebo-in-time test on the same ratio yields p=0.16, leaving the result suggestive given the short pre-period.

Core claim

Because the AEO interventions were concentrated on only part of the site, the untreated remainder absorbs the platform-level tailwind. The ratio of treated to control referrals therefore isolates the incremental effect of the optimization, producing an estimated 1.82-fold level increase (95% CI 1.31-2.54) at the intervention point that survives engagement filters and alternative specifications.

What carries the argument

The weekly treated/control referral ratio analyzed via interrupted time-series regression that identifies the discrete level shift exactly at the January 2026 AEO intervention date.

If this is right

Raw AEO growth multiples reported publicly substantially overstate the causal contribution of the optimization itself.
On-domain untreated pages can serve as a contemporaneous control that separates treatment effects from answer-engine platform growth.
Google organic clicks to treated pages stayed within the ambient site-wide trend and indexation was preserved.
The methodological separation of treatment from tailwind matters more than the specific 1.82 multiple obtained here.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same on-domain control design could be replicated on other high-traffic sites to test AEO effects on additional answer engines.
Extending the pre-intervention observation window would increase statistical power and reduce reliance on the current short noisy baseline.
If the lift generalizes, site owners could run internal AEO pilots with on-domain controls before scaling optimization resources.

Load-bearing premise

The untreated pages on the same domain form a valid contemporaneous control that fully absorbs the platform-level tailwind without any spillover from the AEO interventions applied to the treated subset.

What would settle it

Re-running the interrupted time-series model on a longer pre-intervention window or on a placebo date that produces a jump of similar or larger magnitude in the treated/control ratio would falsify the claim that the observed step change is caused by the AEO bundle.

Figures

Figures reproduced from arXiv: 2606.04362 by Kazuki Nakayashiki, Keisuke Watanabe.

**Figure 2.** Figure 2: Interrupted time series on weekly log(treated/control). A significant pre-trend (red line) continues across a discrete level break of ×1.82 at the intervention; the shaded band marks the approximate rollout window (Dec 2025–Jan 2026), matching the imprecise t0. References [1] P. Aggarwal, V. Murahari, T. Rajpurohit, A. Kalyan, K. Narasimhan, and A. Deshpande. GEO: Generative Engine Optimization. In Proc. 3… view at source ↗

**Figure 3.** Figure 3: SEO safety: treated-page organic clicks and impressions, indexed to July 2025 [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: Placebo-in-time test. The observed January break (red) is large but not clearly outside [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

read the original abstract

Large language model (LLM) "answer engines" such as ChatGPT now send measurable referral traffic to the open web, and a practice analogous to search engine optimization, here called Answer Engine Optimization (AEO), has emerged. Public AEO success stories typically quote large raw growth multiples, but raw referral growth is confounded by the rapid platform-level growth of the answer engines themselves. We report a longitudinal field study on a single high-traffic domain (glasp.co) whose corpus of hundreds of thousands of YouTube question-and-answer pages received a defined bundle of AEO interventions in January 2026 (detailed in Section 4). Because the interventions were concentrated on one subset of the site, the untreated remainder of the same domain acts as a contemporaneous control that absorbs the platform tailwind. Using first-party analytics and server logs rather than probabilistic third-party estimators, we find: (1) raw growth is dominated by the platform tailwind: on monthly aggregates total ChatGPT referrals grew 5.7x while untreated pages on the same domain grew 3.5x over the same window; (2) an interrupted time-series model on the weekly treated/control ratio estimates a discrete, intervention-aligned level increase of 1.82x (95% CI 1.31-2.54, HAC p=0.001), robust across engagement-filtered traffic (2.27x) and alternative specifications; (3) however, a conservative placebo-in-time permutation test yields p=0.16, so the effect is suggestive, not conclusive, given a short and noisy pre-period; and (4) Google organic clicks to treated pages did not fall beyond the ambient site-wide trend and indexation was preserved, consistent with the SEO-protection rule. The methodological message, separating treatment from platform tailwind with an on-domain control, matters more than any single multiple, and implies that headline AEO multiples substantially overstate causal effect.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper isolates a suggestive 1.82x AEO lift via on-domain control but the placebo test at p=0.16 leaves the result marginal.

read the letter

The main takeaway is that this paper applies an interrupted time-series to first-party referral logs from glasp.co and estimates a 1.82x level shift in the treated-to-control ratio right after their January 2026 AEO bundle. Raw growth numbers are much larger, so the design's point is to subtract the platform tailwind using untreated pages on the same domain.

What stands out is the use of actual server logs instead of third-party estimates and the explicit check that Google clicks and indexation did not drop. The authors also run engagement filters and alternative specs, and they flag the placebo-in-time result themselves. That transparency is useful.

The soft spots are exactly where the stress test flags them. The pre-period is short and noisy, which makes it hard to confirm parallel trends in the ratio before the intervention. The placebo permutation gives p=0.16, so the timing-specific claim is not robust. Spillover is also plausible: if the AEO changes affect how the whole domain is indexed or surfaced by the LLM, the control pages are no longer clean. The paper notes these issues but still presents the 1.82x figure as the central result.

This is the kind of work that belongs in an IR or digital marketing reading group. Readers who care about measurement of LLM referral effects will find the control idea worth discussing even if they end up skeptical of the magnitude. It deserves peer review because the identification strategy is laid out clearly and the data are real, not because the evidence is strong. I would send it out rather than desk reject.

Referee Report

2 major / 1 minor

Summary. The manuscript reports results from a natural experiment on glasp.co in which a defined bundle of AEO interventions was applied in January 2026 to one subset of YouTube Q&A pages while the remainder of the domain served as an on-site control. Using first-party server logs and an interrupted time-series model on the weekly treated/control referral ratio, the authors estimate a discrete level increase of 1.82x (95% CI 1.31-2.54) aligned with the intervention date. They also report that raw referral growth is dominated by platform expansion (5.7x total vs. 3.5x on untreated pages) and that a placebo-in-time test yields p=0.16, leading them to describe the AEO effect as suggestive rather than conclusive.

Significance. If the identification assumptions hold, the paper supplies a replicable method for separating AEO treatment effects from platform-level tailwinds using contemporaneous on-domain controls and first-party data. This addresses a common confound in AEO evaluation and shows that headline growth multiples substantially overstate causal impact. The emphasis on the methodological contribution over any single multiple is appropriate given the reported limitations.

major comments (2)

[Section 4 and interrupted time-series setup] Section 4 and the interrupted time-series setup: the claim that untreated pages on the same domain form a valid contemporaneous control that fully absorbs the platform tailwind rests on the assumption of zero spillover. If AEO interventions on the treated subset alter domain-level signals used by LLMs for indexing or referral, the post-intervention rise in the treated/control ratio could be partly attributable to spillover rather than direct treatment; the manuscript provides no direct test or bounding exercise for this possibility.
[Placebo-in-time permutation test] Placebo-in-time permutation test and pre-period description: the reported p=0.16 already indicates that the 1.82x level shift is not robust to timing permutation. Combined with the explicitly noted short and noisy pre-period, this limits verification of parallel pre-trends in the ratio and weakens the support for attributing the discrete change specifically to the January 2026 interventions.

minor comments (1)

[Abstract and results section] The abstract states the result is 'suggestive, not conclusive' but the main text could more consistently qualify the 1.82x estimate with the placebo result in the results section to prevent selective reading.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these constructive comments on the identification strategy and placebo test. We respond point-by-point below and will incorporate revisions to address the concerns raised.

read point-by-point responses

Referee: [Section 4 and interrupted time-series setup] Section 4 and the interrupted time-series setup: the claim that untreated pages on the same domain form a valid contemporaneous control that fully absorbs the platform tailwind rests on the assumption of zero spillover. If AEO interventions on the treated subset alter domain-level signals used by LLMs for indexing or referral, the post-intervention rise in the treated/control ratio could be partly attributable to spillover rather than direct treatment; the manuscript provides no direct test or bounding exercise for this possibility.

Authors: We agree that the zero-spillover assumption is central to causal interpretation and that the current manuscript provides neither a direct test nor a bounding exercise. While the on-domain control is designed to net out platform-level growth, domain-wide signals could create spillover. In revision we will expand the methods section to state the assumption explicitly and add a limitations paragraph that (a) discusses the direction and plausible magnitude of any spillover bias and (b) notes that page-level interventions make broad domain-level effects less likely than direct treatment effects. We will also reference the unchanged Google organic trends as indirect supporting evidence. revision: partial
Referee: [Placebo-in-time permutation test] Placebo-in-time permutation test and pre-period description: the reported p=0.16 already indicates that the 1.82x level shift is not robust to timing permutation. Combined with the explicitly noted short and noisy pre-period, this limits verification of parallel pre-trends in the ratio and weakens the support for attributing the discrete change specifically to the January 2026 interventions.

Authors: The manuscript already reports the p=0.16 placebo result and qualifies the 1.82x estimate as suggestive precisely because of the short, noisy pre-period and the failure of the placebo test to reject the null. We therefore view the current language as appropriately cautious. To further address the referee’s concern we will add one sentence in the results section that explicitly links the placebo outcome to the limited ability to verify parallel pre-trends, but we do not believe additional methodological changes are required. revision: partial

Circularity Check

0 steps flagged

No significant circularity; central estimate is data-driven regression coefficient

full rationale

The paper's key result is an interrupted time-series regression coefficient (1.82x level shift) estimated directly from first-party server logs contrasting treated vs. untreated pages on the same domain. This is a standard empirical contrast, not a self-definitional equation, fitted input renamed as prediction, or result derived from self-citation. No equations, ansatzes, or uniqueness theorems appear in the provided text. The placebo test and robustness checks are also data-based. The analysis is self-contained against external benchmarks via observed traffic logs rather than internal definitions or author-prior citations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the untreated pages absorb all platform growth without spillover and that the January 2026 intervention timing is exogenous to other site changes; no free parameters are explicitly fitted beyond the time-series model coefficients, and no new entities are postulated.

axioms (1)

domain assumption The untreated remainder of the domain acts as a contemporaneous control that absorbs the platform tailwind without spillover from the treated subset.
Invoked in the description of the natural experiment design and the interrupted time-series model.

pith-pipeline@v0.9.1-grok · 5897 in / 1443 out tokens · 31856 ms · 2026-06-28T04:40:46.472530+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The Long Tail, Not the Front Page: Cold-Start Prediction of Crowd Highlight Salience
cs.IR 2026-06 unverdicted novelty 4.0

A supervised logistic ranker on embeddings and features beats the lead baseline by 0.044 average precision in retrospective cold-start prediction of crowd highlights.

Reference graph

Works this paper leans on

6 extracted references · 2 canonical work pages · cited by 1 Pith paper

[1]

Maml-en-llm: Model agnostic meta-training of llms for improved in-context learning

P. Aggarwal, V. Murahari, T. Rajpurohit, A. Kalyan, K. Narasimhan, and A. Deshpande. GEO: Generative Engine Optimization. InProc. 30th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (KDD ’24), 2024. arXiv:2311.09735. doi:10.1145/3637528.3671900

work page doi:10.1145/3637528.3671900 2024
[2]

N. F. Liu, T. Zhang, and P. Liang. Evaluating Verifiability in Generative Search En- gines. InFindings of the ACL: EMNLP 2023, pp. 7001–7025, 2023. arXiv:2304.09848. doi:10.18653/v1/2023.findings-emnlp.467

work page doi:10.18653/v1/2023.findings-emnlp.467 2023
[3]

Google users are less likely to click on links when an AI summary appears in the results

Pew Research Center. Google users are less likely to click on links when an AI summary appears in the results. July 22, 2025.https://www.pewresearch.org/short-reads/2025/07/22/go ogle-users-are-less-likely-to-click-on-links-when-an-ai-summary-appears-in-t he-results/

2025
[4]

A. K. Wagner, S. B. Soumerai, F. Zhang, and D. Ross-Degnan. Segmented regression analysis of interrupted time series studies in medication use research.Journal of Clinical Pharmacy and Therapeutics, 27(4):299–309, 2002

2002
[5]

W. K. Newey and K. D. West. A Simple, Positive Semi-Definite, Heteroskedasticity and Auto- correlation Consistent Covariance Matrix.Econometrica, 55(3):703–708, 1987

1987
[6]

Bertrand, E

M. Bertrand, E. Duflo, and S. Mullainathan. How Much Should We Trust Differences-in- Differences Estimates?Quarterly Journal of Economics, 119(1):249–275, 2004. 8 2025-072025-082025-092025-102025-112025-122026-012026-022026-032026-042026-05 0 20 40 60 80 100 120 140Indexed to Jul 2025 = 100 SEO safety: treated-page clicks declined modestly; indexation pre...

2004

[1] [1]

Maml-en-llm: Model agnostic meta-training of llms for improved in-context learning

P. Aggarwal, V. Murahari, T. Rajpurohit, A. Kalyan, K. Narasimhan, and A. Deshpande. GEO: Generative Engine Optimization. InProc. 30th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining (KDD ’24), 2024. arXiv:2311.09735. doi:10.1145/3637528.3671900

work page doi:10.1145/3637528.3671900 2024

[2] [2]

N. F. Liu, T. Zhang, and P. Liang. Evaluating Verifiability in Generative Search En- gines. InFindings of the ACL: EMNLP 2023, pp. 7001–7025, 2023. arXiv:2304.09848. doi:10.18653/v1/2023.findings-emnlp.467

work page doi:10.18653/v1/2023.findings-emnlp.467 2023

[3] [3]

Google users are less likely to click on links when an AI summary appears in the results

Pew Research Center. Google users are less likely to click on links when an AI summary appears in the results. July 22, 2025.https://www.pewresearch.org/short-reads/2025/07/22/go ogle-users-are-less-likely-to-click-on-links-when-an-ai-summary-appears-in-t he-results/

2025

[4] [4]

A. K. Wagner, S. B. Soumerai, F. Zhang, and D. Ross-Degnan. Segmented regression analysis of interrupted time series studies in medication use research.Journal of Clinical Pharmacy and Therapeutics, 27(4):299–309, 2002

2002

[5] [5]

W. K. Newey and K. D. West. A Simple, Positive Semi-Definite, Heteroskedasticity and Auto- correlation Consistent Covariance Matrix.Econometrica, 55(3):703–708, 1987

1987

[6] [6]

Bertrand, E

M. Bertrand, E. Duflo, and S. Mullainathan. How Much Should We Trust Differences-in- Differences Estimates?Quarterly Journal of Economics, 119(1):249–275, 2004. 8 2025-072025-082025-092025-102025-112025-122026-012026-022026-032026-042026-05 0 20 40 60 80 100 120 140Indexed to Jul 2025 = 100 SEO safety: treated-page clicks declined modestly; indexation pre...

2004