WebKnoGraph: GNN-Powered Internal Linking

Emilija Gjorgjevska; Georgina Mirceva; Miroslav Mirchev

arxiv: 2606.06106 · v1 · pith:5UB7SA3Tnew · submitted 2026-06-04 · 💻 cs.IR

WebKnoGraph: GNN-Powered Internal Linking

Emilija Gjorgjevska , Georgina Mirceva , Miroslav Mirchev This is my paper

Pith reviewed 2026-06-27 23:29 UTC · model grok-4.3

classification 💻 cs.IR

keywords internal linkingGraphSAGEPageRankauthority yieldsemantic coherenceGNNwebsite graphlink optimization

0 comments

The pith

Automatic link selection via GraphSAGE yields higher authority redistribution than expert-assisted selection, at the cost of semantic coherence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

WebKnoGraph evaluates internal linking strategies by modeling a website as a directed graph whose pages are represented by embeddings, then scoring candidate links with GraphSAGE before testing the effects inside larger simulated host environments. The work compares fully automatic link selection against expert-assisted choices on a production crawl of Kalicube.com, tracking PageRank-based authority metrics and semantic coherence scores. Automatic selection produces stronger authority shifts and higher Authority Yield overall, while expert-assisted selection maintains better coherence and reaches the single highest yield when it targets low-PageRank pages. Practitioners care because link changes can redistribute ranking power and alter navigation in ways that are expensive to reverse after deployment. The framework therefore supplies a pre-release testing loop that scores intervention sets jointly on authority gain, volatility, loss-gain balance, and coherence.

Core claim

The framework models a website as a directed graph, represents pages by embeddings, scores candidate links with GraphSAGE, and evaluates interventions by embedding the site into larger host environments. On the Kalicube.com crawl, automatic selection generally produces stronger authority redistribution with higher Authority Yield, but also larger semantic coherence costs. Expert-assisted selection better preserves semantic coherence and, when targeting low-PageRank pages, achieves the highest Authority Yield, although with the least favorable loss-gain balance.

What carries the argument

WebKnoGraph framework that models the target site as a directed graph, scores candidate links with GraphSAGE, and measures authority and coherence after embedding the site inside a FineWeb-based or Barabási-Albert host graph.

If this is right

Automatic selection produces stronger authority redistribution and higher overall Authority Yield.
Expert-assisted selection preserves semantic coherence more effectively.
Expert-assisted selection aimed at low-PageRank pages reaches the single highest Authority Yield.
Authority Volatility supplies an additional stability signal, although different numbers of intervention sets limit direct comparison.
A usable workflow generates candidate sets at scale, scores them jointly on the four metrics, and routes the best ones for editorial review before deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hybrid pipelines that let automatic methods propose many candidates and experts pick among them could combine high yield with acceptable coherence.
The same evaluation loop could be applied to other site modifications such as content rewrites or navigation menu changes.
Repeating the experiments across several production sites would reveal whether the observed patterns generalize beyond one crawl.
If live ranking data later contradict the simulated authority gains, the host-graph construction step would need revision.

Load-bearing premise

That authority and coherence measurements obtained inside the simulated FineWeb or synthetic host graphs will match the outcomes that real search engines produce once the chosen links are added to the live site.

What would settle it

Deploy the top automatic and expert-assisted link sets on the actual Kalicube.com site, then compare subsequent search-ranking shifts, traffic changes, and navigation metrics against an untouched control group.

Figures

Figures reproduced from arXiv: 2606.06106 by Emilija Gjorgjevska, Georgina Mirceva, Miroslav Mirchev.

**Figure 2.** Figure 2: Panels (a) and (b) report the Barabási–Albert results, while [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 2.** Figure 2: Authority-flow and semantic-coherence effects by host environment, strategy, and selection regime. Panels (a)–(c) [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

read the original abstract

Internal link optimization is a recurring task in search engine optimization, yet many production workflows rely on manual judgment, fixed page templates, or generic tool recommendations. Practitioners need ways to evaluate candidate links before deployment because link changes can redistribute authority and affect semantic coherence in ways that are difficult to isolate after release. We present WebKnoGraph, an open-source framework for evaluating internal linking strategies on website crawls. The framework models a website as a directed graph, represents pages by embeddings, scores candidate links with GraphSAGE, and evaluates interventions by embedding the site into larger host environments. We instantiate WebKnoGraph on a production crawl of Kalicube.com and compare automatic with expert-assisted link selection in an empirical FineWeb-based host graph and a synthetic Barab\'asi-Albert host graph, using PageRank-based authority metrics and semantic coherence. The results show that automatic selection generally produces stronger authority redistribution, with higher Authority Yield, but also larger semantic coherence costs. Expert-assisted selection better preserves semantic coherence and, when targeting low-PageRank pages, achieves the highest Authority Yield, although with the least favorable loss-gain balance. Authority Volatility provides an additional stability perspective, but is interpreted cautiously because the two regimes use different numbers of intervention sets. These findings support a practical workflow in which candidate intervention sets are generated at scale, evaluated jointly across authority gain, volatility, loss-gain balance, and semantic coherence, and then reviewed for editorial deployability before implementation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

WebKnoGraph packages GraphSAGE scoring and host-graph PageRank tests into an open workflow for internal link evaluation, but the reported authority and coherence differences rest on unvalidated proxy graphs.

read the letter

The paper introduces WebKnoGraph as a framework that crawls a site, embeds pages, scores candidate links with GraphSAGE, and then tests intervention sets by placing the site inside a larger host graph before running PageRank. It compares automatic link selection against expert-assisted choices on Kalicube.com using authority yield, coherence, and volatility metrics in both a FineWeb-derived host graph and a Barabási-Albert synthetic one.

What stands out is the concrete, open-source workflow that lets practitioners generate and score many candidate link sets before deployment. The joint use of authority redistribution and semantic coherence as evaluation criteria matches a real production need in SEO, and the distinction between targeting low-PageRank pages versus others is a sensible practical angle.

The main limitation is the absence of any external check on whether authority deltas measured in these proxy graphs track actual post-deployment changes in search-engine rankings. The abstract supplies no dataset sizes, variance estimates, or correlation with live crawl data, so the claim that automatic selection yields higher authority while expert selection preserves coherence better cannot yet be assessed for robustness. The two host-graph regimes also differ in intervention-set counts, which complicates direct comparison of volatility.

This work is aimed at SEO practitioners and applied IR researchers who already use graph methods and want a reusable evaluation layer. It deserves a serious referee because the code is open and the task is well-defined, even though the current evidence is limited to proxy-graph results that still need grounding against real search behavior.

Referee Report

2 major / 2 minor

Summary. The paper presents WebKnoGraph, an open-source framework that models a website as a directed graph, uses page embeddings and GraphSAGE to score candidate internal links, and evaluates interventions by embedding the target site into larger host graphs (a FineWeb-derived empirical graph and a Barabási-Albert synthetic graph). Authority redistribution is measured via PageRank-derived metrics including Authority Yield, while semantic coherence and Authority Volatility are also tracked. On a production crawl of Kalicube.com, the empirical comparison finds that automatic link selection generally produces higher Authority Yield than expert-assisted selection, but at greater coherence cost; expert-assisted selection preserves coherence better and can achieve the highest Authority Yield when targeting low-PageRank pages.

Significance. If the proxy host graphs produce authority deltas that track real post-deployment PageRank changes, the framework would offer a practical pre-deployment evaluation workflow for internal linking that jointly considers authority gain, volatility, loss-gain balance, and semantic coherence. The open-source release and use of GNN-based scoring are concrete strengths that could support reproducibility and extension.

major comments (2)

[Evaluation section] Evaluation section (host-graph construction and results): All quantitative comparisons of Authority Yield, coherence, and volatility rest on the assumption that embedding Kalicube.com into the FineWeb-based or Barabási-Albert host graphs yields authority deltas that generalize to real search-engine behavior after link deployment. No external validation, correlation with live crawl data, or sensitivity analysis across host-graph choices is reported, so the relative ordering of automatic vs. expert-assisted strategies cannot yet be treated as deployable evidence.
[Results] Results paragraphs on Authority Yield and coherence: The claims that automatic selection 'generally produces stronger authority redistribution' and expert-assisted 'achieves the highest Authority Yield when targeting low-PageRank pages' are presented without reported dataset sizes for the intervention sets, error bars, or statistical tests. This makes it impossible to determine whether the observed differences are robust or merely artifacts of the particular crawl and graph realizations.

minor comments (2)

[Abstract and Results] The abstract and results sections would benefit from explicit statements of the number of pages in the Kalicube.com crawl, the number of candidate links evaluated, and the precise definitions of Authority Yield and loss-gain balance (including any free parameters).
[Figures and Evaluation] Figure captions and the description of the two host-graph regimes should clarify why different numbers of intervention sets are used and how this affects the interpretation of Authority Volatility.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback on the evaluation assumptions and results presentation. We respond to each major comment below, clarifying the framework's scope as a proxy-based pre-deployment tool while committing to clarifications and additions where possible.

read point-by-point responses

Referee: [Evaluation section] Evaluation section (host-graph construction and results): All quantitative comparisons of Authority Yield, coherence, and volatility rest on the assumption that embedding Kalicube.com into the FineWeb-based or Barabási-Albert host graphs yields authority deltas that generalize to real search-engine behavior after link deployment. No external validation, correlation with live crawl data, or sensitivity analysis across host-graph choices is reported, so the relative ordering of automatic vs. expert-assisted strategies cannot yet be treated as deployable evidence.

Authors: The framework is explicitly designed as a simulation using proxy host graphs to enable relative comparisons of linking strategies prior to deployment, rather than a direct model of live search-engine dynamics. The manuscript already cautions on interpretation for Authority Volatility due to differing intervention set sizes. We will revise the evaluation section to more explicitly articulate the proxy limitations and emphasize that results provide comparative insights within the modeled environments, not absolute deployable predictions. Additional sensitivity analysis beyond the empirical and synthetic graphs used is outside the current study scope. revision: partial
Referee: [Results] Results paragraphs on Authority Yield and coherence: The claims that automatic selection 'generally produces stronger authority redistribution' and expert-assisted 'achieves the highest Authority Yield when targeting low-PageRank pages' are presented without reported dataset sizes for the intervention sets, error bars, or statistical tests. This makes it impossible to determine whether the observed differences are robust or merely artifacts of the particular crawl and graph realizations.

Authors: We will add the exact sizes of the intervention sets to the results section in the revision. The experiments rely on single realizations of the host graphs, and we will include an explicit statement noting the lack of error bars or statistical tests as a limitation of the current setup, aligning with the existing cautious interpretation of Authority Volatility. revision: yes

standing simulated objections not resolved

External validation, correlation with live crawl data, or post-deployment PageRank changes, as these require production search engine access and real-world deployment experiments not feasible within this study.

Circularity Check

0 steps flagged

No circularity; empirical metrics computed directly from simulations on proxy graphs.

full rationale

The paper describes modeling a site as a graph, scoring links via GraphSAGE, embedding into FineWeb or Barabási-Albert host graphs, and computing PageRank-derived metrics (Authority Yield, coherence, volatility) for automatic vs. expert-assisted interventions. No equations, fitted parameters, or self-citations are shown that reduce any reported result to a definition or input by construction. All quantitative comparisons are independent simulation outputs, not tautological renamings or self-referential fits. The generalization from proxy graphs is an external-validity concern, not a circularity issue.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The abstract relies on standard graph-theoretic modeling and the assumption that PageRank and embedding-based coherence are appropriate proxies; no free parameters, ad-hoc axioms, or invented entities are named.

axioms (2)

domain assumption Websites can be faithfully represented as directed graphs whose nodes carry semantic embeddings.
Invoked when the framework 'models a website as a directed graph' and 'represents pages by embeddings'.
domain assumption Authority redistribution after link changes can be approximated by PageRank recomputation inside an external host graph.
Central to the evaluation of 'Authority Yield' and 'Authority Volatility'.

pith-pipeline@v0.9.1-grok · 5798 in / 1411 out tokens · 19977 ms · 2026-06-27T23:29:01.726999+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 4 canonical work pages · 3 internal anchors

[1]

[n. d.]. Yandex Ranking Factors. https://yandex-ranking-factors.netlify.app/
[2]

Anurag Acharya, Matt Cutts, Jeffrey Dean, Paul Haahr, Monika Henzinger, Steve Lawrence, Karl Pfleger, and Simon Tong. 2013. Document scoring based on link-based criteria. Expired – fee related; priority application filed Sept 30, 2003

2013
[3]

Shaun Anderson. 2025. Strategic SEO. https://www.hobo-web.co.uk/strategic- seo-2025/

2025
[4]

Konstantin Avrachenkov and Nelly Litvak. 2004. Decomposition of the google pagerank and optimal linking strategy.INRIA Research Report(2004)

2004
[5]

Ricardo A Baeza-Yates, Carlos Castillo, Vicente López, and Cátedra Telefónica
[6]

InAIRWeb, Vol

Pagerank Increase under Different Collusion Topologies. InAIRWeb, Vol. 5. 25–32
[7]

Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks.Science286, 5439 (1999), 509–512. Conference’17, July 2017, Washington, DC, USA Gjorgjevska et al

1999
[8]

Adrien Barbaresi. 2021. Trafilatura: A web scraping library and command-line tool for text discovery and extraction. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations. 122–131

2021
[9]

Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine.Computer networks and ISDN systems30, 1-7 (1998), 107–117

1998
[10]

Balázs Csanád Csáji, Raphaël M Jungers, and Vincent D Blondel. 2014. PageRank optimization by edge selection.Discrete Applied Mathematics169 (2014), 73–87

2014
[11]

Dennis Fetterly, Mark Manasse, and Marc Najork. 2004. Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. InProceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004. 1–6

2004
[12]

Martin Gerlach, Marshall Miller, Rita Ho, Kosta Harlan, and Djellel Difallah. 2021. Multilingual entity linking system for wikipedia with a machine-in-the-loop approach. InProceedings of the 30th ACM International Conference on Information & Knowledge Management. 3818–3827

2021
[13]

Emilija Gjorgjevska and Georgina Mirceva. 2021. Content Engineering for State- of-the-art SEO Digital Strategies by Using NLP and ML. In2021 3rd International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA). IEEE, 1–6

2021
[14]

Emilija Gjorgjevska, Miroslav Mirchev, and Georgina Mircheva. 2024. Web- KnoGraph: AI-Driven Framework for Large-Scale Internal Link Optimization. https://github.com/martech-engineer/WebKnoGraph

2024
[15]

GrowthSRC Media. 2025. Leaked Google Search Algorithm Ranking Factors Database: By GrowthSRC Media.searchrankingfactors.com(2025). https:// searchrankingfactors.com/ Accessed: 12 September 2025

2025
[16]

Nissan Hajaj. 2015. Producing a ranking for pages using distances in a web-link graph. Term extended by 268 days under 35 U.S.C. 154(b)

2015
[17]

Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs.Advances in neural information processing systems30 (2017)

2017
[18]

2025.SEO Market Size & Share Analysis – Growth Trends & Forecasts (2025–2030)

Mordor Intelligence. 2025.SEO Market Size & Share Analysis – Growth Trends & Forecasts (2025–2030). https://www.mordorintelligence.com/industry-reports/ seo-market Accessed: 2025-09-29

2025
[19]

Ivan Franko Lviv National University, O I Marchuk, and T M Kushnir. 2024. Evaluation of the effectiveness of offline search optimization in the SEO toolbox. Mark. Digit. Technol.8, 4 (Dec. 2024), 44–57

2024
[20]

Shima Khoshraftar and Aijun An. 2024. A survey on graph representation learning methods.ACM Transactions on Intelligent Systems and Technology15, 1 (2024), 1–55

2024
[21]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[22]

2015.Search engine optimization bible

Jerri L Ledford. 2015.Search engine optimization bible. Vol. 584. John Wiley & Sons

2015
[23]

Dirk Lewandowski, Sebastian Sünkler, and Nurce Yagci. 2021. The influence of search engine optimization on Google’s results: A multi-dimensional approach for detecting SEO. InProceedings of the 13th ACM Web Science Conference 2021. 12–20

2021
[24]

Lijun Lyu and Besnik Fetahu. 2018. Real-time event-based news suggestion for Wikipedia pages from news streams. InCompanion Proceedings of the The Web Conference 2018. 1793–1799

2018
[25]

Ross A Malaga. 2008. Worst practices in search engine optimization.Commun. ACM51, 12 (2008), 147–150

2008
[26]

Natasa Milic-Frayling, Eduarda Mendes Rodrigues, and Shashank Pandit. 2008. Website structure analysis. Application publication; priority filing Dec 5, 2006

2008
[27]

Morris, Brandon Duderstadt, and Andriy Mulyar

Zach Nussbaum, John X. Morris, Brandon Duderstadt, and Andriy Mulyar
[28]

Nomic Embed: Training a Reproducible Long Context Text Embedder

Nomic Embed: Training a Reproducible Long Context Text Embedder. arXiv:2402.01613 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv
[29]

Naoto Ohsaka, Tomohiro Sonobe, Naonori Kakimura, Takuro Fukunaga, Sumio Fujita, and Ken-ichi Kawarabayashi. 2018. Boosting PageRank scores by optimiz- ing internal link structure. InInternational Conference on Database and Expert Systems Applications. Springer, 424–439

2018
[30]

1999.The PageRank citation ranking: Bringing order to the web.Technical Report

Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999.The PageRank citation ranking: Bringing order to the web.Technical Report. Stanford infolab

1999
[31]

Anna Patterson and Paul Haahr. 2013. Ranking based on reference contexts. Priority and filing both on 2004-03-15; expected expiry March 28, 2032

2013
[32]

Guilherme Penedo, Hynek Kydlíček, Loubna Ben allal, Anton Lozhkov, Margaret Mitchell, Colin Raffel, Leandro Von Werra, and Thomas Wolf. 2024. The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale. InThe Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=n6SCkn2QaG

2024
[33]

Memet Sanjaya, Rizaldi Putra, Deni Utama, and Arif Prayoga. 2025. Optimizing website ranking using long-tail keywords and internal linking: A case study.jidt (Aug. 2025), 31–36

2025
[34]

Hasnae Amnoun1 Naoual Smaili, Hamza Barboucha1, and Mohcine Kodad. 2024. The Future of Search Attention: Leveraging AI to Enhance PageRank’s Influence. Advances in Smart Medical, IoT & Artificial Intelligence: Proceedings of ICSMAI’2024, Volume 111 (2024), 125

2024
[35]

Olof Sundin. 2025. Theorising notions of searching, (re)sources and evaluation in the light of generative AI.Information Research30, CoLIS (May 2025), 291–302. doi:10.47989/ir30CoLIS52258

work page doi:10.47989/ir30colis52258 2025
[36]

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks.arXiv preprint arXiv:1710.10903(2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[37]

wordLift. [n. d.]. Creating Internal Links | WordLift Developer Documenta- tion — docs.wordlift.io. https://docs.wordlift.io/agent-wordlift/workflows/create- internal-links/. [Accessed 03-09-2025]

2025
[38]

Shanchan Wu, Louiqa Raschid, and William Rand. 2011. Future link prediction in the blogosphere for recommendation. InProceedings of the International AAAI Conference on Web and Social Media, Vol. 5. 642–645. Appendix A. GenAI Usage Disclosure Large Language Models (LLMs) were used as assistants during the preparation of this paper. Specifically, LLMs supp...

2011

[1] [1]

[n. d.]. Yandex Ranking Factors. https://yandex-ranking-factors.netlify.app/

[2] [2]

Anurag Acharya, Matt Cutts, Jeffrey Dean, Paul Haahr, Monika Henzinger, Steve Lawrence, Karl Pfleger, and Simon Tong. 2013. Document scoring based on link-based criteria. Expired – fee related; priority application filed Sept 30, 2003

2013

[3] [3]

Shaun Anderson. 2025. Strategic SEO. https://www.hobo-web.co.uk/strategic- seo-2025/

2025

[4] [4]

Konstantin Avrachenkov and Nelly Litvak. 2004. Decomposition of the google pagerank and optimal linking strategy.INRIA Research Report(2004)

2004

[5] [5]

Ricardo A Baeza-Yates, Carlos Castillo, Vicente López, and Cátedra Telefónica

[6] [6]

InAIRWeb, Vol

Pagerank Increase under Different Collusion Topologies. InAIRWeb, Vol. 5. 25–32

[7] [7]

Albert-László Barabási and Réka Albert. 1999. Emergence of scaling in random networks.Science286, 5439 (1999), 509–512. Conference’17, July 2017, Washington, DC, USA Gjorgjevska et al

1999

[8] [8]

Adrien Barbaresi. 2021. Trafilatura: A web scraping library and command-line tool for text discovery and extraction. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: System Demonstrations. 122–131

2021

[9] [9]

Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine.Computer networks and ISDN systems30, 1-7 (1998), 107–117

1998

[10] [10]

Balázs Csanád Csáji, Raphaël M Jungers, and Vincent D Blondel. 2014. PageRank optimization by edge selection.Discrete Applied Mathematics169 (2014), 73–87

2014

[11] [11]

Dennis Fetterly, Mark Manasse, and Marc Najork. 2004. Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. InProceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004. 1–6

2004

[12] [12]

Martin Gerlach, Marshall Miller, Rita Ho, Kosta Harlan, and Djellel Difallah. 2021. Multilingual entity linking system for wikipedia with a machine-in-the-loop approach. InProceedings of the 30th ACM International Conference on Information & Knowledge Management. 3818–3827

2021

[13] [13]

Emilija Gjorgjevska and Georgina Mirceva. 2021. Content Engineering for State- of-the-art SEO Digital Strategies by Using NLP and ML. In2021 3rd International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA). IEEE, 1–6

2021

[14] [14]

Emilija Gjorgjevska, Miroslav Mirchev, and Georgina Mircheva. 2024. Web- KnoGraph: AI-Driven Framework for Large-Scale Internal Link Optimization. https://github.com/martech-engineer/WebKnoGraph

2024

[15] [15]

GrowthSRC Media. 2025. Leaked Google Search Algorithm Ranking Factors Database: By GrowthSRC Media.searchrankingfactors.com(2025). https:// searchrankingfactors.com/ Accessed: 12 September 2025

2025

[16] [16]

Nissan Hajaj. 2015. Producing a ranking for pages using distances in a web-link graph. Term extended by 268 days under 35 U.S.C. 154(b)

2015

[17] [17]

Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs.Advances in neural information processing systems30 (2017)

2017

[18] [18]

2025.SEO Market Size & Share Analysis – Growth Trends & Forecasts (2025–2030)

Mordor Intelligence. 2025.SEO Market Size & Share Analysis – Growth Trends & Forecasts (2025–2030). https://www.mordorintelligence.com/industry-reports/ seo-market Accessed: 2025-09-29

2025

[19] [19]

Ivan Franko Lviv National University, O I Marchuk, and T M Kushnir. 2024. Evaluation of the effectiveness of offline search optimization in the SEO toolbox. Mark. Digit. Technol.8, 4 (Dec. 2024), 44–57

2024

[20] [20]

Shima Khoshraftar and Aijun An. 2024. A survey on graph representation learning methods.ACM Transactions on Intelligent Systems and Technology15, 1 (2024), 1–55

2024

[21] [21]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[22] [22]

2015.Search engine optimization bible

Jerri L Ledford. 2015.Search engine optimization bible. Vol. 584. John Wiley & Sons

2015

[23] [23]

Dirk Lewandowski, Sebastian Sünkler, and Nurce Yagci. 2021. The influence of search engine optimization on Google’s results: A multi-dimensional approach for detecting SEO. InProceedings of the 13th ACM Web Science Conference 2021. 12–20

2021

[24] [24]

Lijun Lyu and Besnik Fetahu. 2018. Real-time event-based news suggestion for Wikipedia pages from news streams. InCompanion Proceedings of the The Web Conference 2018. 1793–1799

2018

[25] [25]

Ross A Malaga. 2008. Worst practices in search engine optimization.Commun. ACM51, 12 (2008), 147–150

2008

[26] [26]

Natasa Milic-Frayling, Eduarda Mendes Rodrigues, and Shashank Pandit. 2008. Website structure analysis. Application publication; priority filing Dec 5, 2006

2008

[27] [27]

Morris, Brandon Duderstadt, and Andriy Mulyar

Zach Nussbaum, John X. Morris, Brandon Duderstadt, and Andriy Mulyar

[28] [28]

Nomic Embed: Training a Reproducible Long Context Text Embedder

Nomic Embed: Training a Reproducible Long Context Text Embedder. arXiv:2402.01613 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv

[29] [29]

Naoto Ohsaka, Tomohiro Sonobe, Naonori Kakimura, Takuro Fukunaga, Sumio Fujita, and Ken-ichi Kawarabayashi. 2018. Boosting PageRank scores by optimiz- ing internal link structure. InInternational Conference on Database and Expert Systems Applications. Springer, 424–439

2018

[30] [30]

1999.The PageRank citation ranking: Bringing order to the web.Technical Report

Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999.The PageRank citation ranking: Bringing order to the web.Technical Report. Stanford infolab

1999

[31] [31]

Anna Patterson and Paul Haahr. 2013. Ranking based on reference contexts. Priority and filing both on 2004-03-15; expected expiry March 28, 2032

2013

[32] [32]

Guilherme Penedo, Hynek Kydlíček, Loubna Ben allal, Anton Lozhkov, Margaret Mitchell, Colin Raffel, Leandro Von Werra, and Thomas Wolf. 2024. The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale. InThe Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=n6SCkn2QaG

2024

[33] [33]

Memet Sanjaya, Rizaldi Putra, Deni Utama, and Arif Prayoga. 2025. Optimizing website ranking using long-tail keywords and internal linking: A case study.jidt (Aug. 2025), 31–36

2025

[34] [34]

Hasnae Amnoun1 Naoual Smaili, Hamza Barboucha1, and Mohcine Kodad. 2024. The Future of Search Attention: Leveraging AI to Enhance PageRank’s Influence. Advances in Smart Medical, IoT & Artificial Intelligence: Proceedings of ICSMAI’2024, Volume 111 (2024), 125

2024

[35] [35]

Olof Sundin. 2025. Theorising notions of searching, (re)sources and evaluation in the light of generative AI.Information Research30, CoLIS (May 2025), 291–302. doi:10.47989/ir30CoLIS52258

work page doi:10.47989/ir30colis52258 2025

[36] [36]

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks.arXiv preprint arXiv:1710.10903(2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[37] [37]

wordLift. [n. d.]. Creating Internal Links | WordLift Developer Documenta- tion — docs.wordlift.io. https://docs.wordlift.io/agent-wordlift/workflows/create- internal-links/. [Accessed 03-09-2025]

2025

[38] [38]

Shanchan Wu, Louiqa Raschid, and William Rand. 2011. Future link prediction in the blogosphere for recommendation. InProceedings of the International AAAI Conference on Web and Social Media, Vol. 5. 642–645. Appendix A. GenAI Usage Disclosure Large Language Models (LLMs) were used as assistants during the preparation of this paper. Specifically, LLMs supp...

2011