arxiv: 2604.13784 · v2 · submitted 2026-04-15 · 💻 cs.SI

Recognition: unknown

Citation Farming on ResearchGate: Blatant and Effective

Cenk Erdogan , Bennett Daniel , Benedikt Wotka , Ashish Sai , Adriana Iamnitchi

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:48 UTC · model grok-4.3

classification 💻 cs.SI

keywords citation farmingResearchGatecitation boostingreference listscitation networkscoordinated activityacademic metrics

0 comments

The pith

Papers from suspected boosting accounts on ResearchGate form clusters with identical reference lists that disproportionately cite certain authors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines nearly 3000 papers uploaded by five accounts suspected of providing citation boosting services on ResearchGate. It builds citation networks at both the paper and author levels and defines equal references groups as clusters of papers that share exactly the same reference lists. These groups appear frequently in the collection and cause the papers to cite a small set of authors far more than would be expected under independent scholarly practice. For some authors a large fraction of their total citations originates from these clusters. A separate validation network shows the motif is uncommon in ordinary scientific output.

Core claim

The central discovery is that equal references groups function as an interpretable structural signal for coordinated or automated citation boosting. In the analyzed collection many papers belong to such groups, resulting in disproportionate citation to a small set of authors, and a substantial share of some authors' citations can be traced directly to these suspicious clusters rather than to independent citing papers.

What carries the argument

Equal references groups: clusters of papers that share identical reference lists, serving as the detectable pattern that distinguishes coordinated boosting from normal citation behavior.

If this is right

Citation counts for certain authors can be substantially inflated by citations originating inside these coordinated clusters.
The equal references motif offers a concrete, checkable indicator that can be applied to other citation networks to flag possible boosting.
Author-level impact metrics on ResearchGate become less reliable when a measurable fraction of citations arrives through such groups.
The observed pattern demonstrates that boosting services can alter citation distributions at scale on the platform.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same structural signal could be tested on citation data from other platforms to estimate how widespread coordinated boosting has become.
Evaluations that rely on raw citation totals may benefit from filtering or weighting papers that belong to equal references clusters.
Longitudinal tracking of authors who receive many citations from these groups could reveal whether the boosted metrics persist after the activity stops.

Load-bearing premise

The five accounts are correctly identified as boosting-service providers and identical reference lists primarily indicate coordination rather than legitimate reuse such as templates or narrow subfield conventions.

What would settle it

A large sample of papers known to be authored independently and without boosting involvement would need to contain equal reference list clusters at rates comparable to the studied collection.

Figures

Figures reproduced from arXiv: 2604.13784 by Adriana Iamnitchi, Ashish Sai, Benedikt Wotka, Bennett Daniel, Cenk Erdogan.

**Figure 3.** Figure 3: Scatter plot showing the relationship between the [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 2.** Figure 2: Two equal references groups. Green nodes denote [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: Comparison of numbers of groups and groups size [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

read the original abstract

We investigate platform-native citation farming on ResearchGate by analyzing almost 3000 papers uploaded by five suspected boosting-service provider accounts. From the uploaded papers and associated metadata, we construct both paper-level and author-level citation networks. We introduce an interpretable structural signal for coordinated boosting, equal references groups: clusters of papers with equal reference lists. We find that many papers from our collection exhibit this motif, that is, they disproportionately cite a small set of authors, consistent with coordinated or automated boosting rather than independent scholarly practice. Finally, we show that for some authors in our dataset a substantial share of their citations can be attributed to these suspicious groups. A different citation network was used to validate the rareness of such motifs in legitimate scientific work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a clean, practical new motif for spotting coordinated citation boosting on ResearchGate and shows it operating at scale in their sample, but the control comparison leaves room for legitimate reuse explanations.

read the letter

The main thing here is the equal-references-groups signal: clusters of papers that share identical reference lists and disproportionately point to a small set of authors. They pull this from nearly 3000 papers uploaded by five accounts they flag as suspected boosting providers, build both paper-level and author-level networks, and report that the motif appears often in their collection while a separate citation network shows it is rare in ordinary work. For some authors in the set, a substantial fraction of citations trace back to these groups. That is the concrete contribution, and it is easy to understand and apply to platform data without heavy modeling. The construction of the networks from uploaded papers and metadata is straightforward and the contrast with the control network supplies direct empirical grounding. The motif itself does not appear in the prior work they cite, so the structural observation is new. The soft spot is exactly the one the stress-test flags. Equal reference lists can arise from template papers, shared datasets, or tight subfield conventions, and the control network is not described as matched on those dimensions. If those patterns occur at non-trivial rates outside boosting, the motif loses specificity and the claim that a substantial share of citations comes from suspicious groups becomes harder to pin down. The accounts are labeled suspected without the selection criteria spelled out in the abstract, which adds another layer of uncertainty. This work is for people who track citation manipulation, platform integrity, or the reliability of metrics used in hiring and funding. It is not a broad theoretical advance but a targeted detection tool with real data behind it. I would send it to peer review because the idea is interpretable, the scale is decent, and the limitation is fixable with better controls or field-specific checks rather than fatal.

Referee Report

3 major / 2 minor

Summary. The paper analyzes nearly 3000 papers uploaded by five suspected boosting-service provider accounts on ResearchGate. It constructs paper- and author-level citation networks from the uploaded papers and metadata, introduces 'equal references groups' (clusters of papers sharing identical reference lists) as an interpretable signal of coordinated boosting, reports that many papers in the collection exhibit this motif and disproportionately cite a small set of authors, shows that a substantial share of citations to some authors in the dataset can be attributed to these groups, and validates the motif's rarity via comparison to a separate legitimate citation network.

Significance. If the account identification and motif specificity hold, the work supplies direct empirical evidence of platform-native citation farming on ResearchGate, illustrating how automated or coordinated uploading can inflate author-level metrics. The network construction from actual uploaded papers and the contrast with a control network provide concrete, replicable support for detecting such activity, which could inform platform moderation and studies of citation integrity.

major comments (3)

[Methods] Methods (account selection): The five accounts are labeled 'suspected' boosting-service providers, yet the specific criteria, metadata patterns, or evidence used to identify them are described only at a high level. This is load-bearing because the full dataset, motif analysis, and citation-attribution claims rest on these accounts being the source of the farming activity rather than ordinary uploads.
[Validation] Validation section: The separate citation network used to establish that equal-references motifs are rare in legitimate work is not described as matched on paper genre, methodological overlap, or narrow subfields (e.g., data-descriptor or protocol papers where reference-list reuse is common). Without such controls, the motif's specificity as a signal of coordination versus legitimate reuse is not fully demonstrated.
[Results] Results (disproportionate citation): The claim that papers in equal-references groups 'disproportionately cite a small set of authors' requires an explicit quantitative baseline (e.g., comparison against the degree distribution or a null model in the constructed network) to support the interpretation of coordinated boosting over independent scholarly practice.

minor comments (2)

[Abstract] Abstract: The phrase 'a different citation network' should specify its approximate size, source, or construction method to give readers immediate context for the validation step.
[Introduction] Terminology: Ensure 'equal references groups' is defined with precise operational criteria (exact string match on reference lists, or allowing minor formatting variations) at first use and applied consistently.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and valuable feedback on our manuscript. We have carefully considered each major comment and provide point-by-point responses below, along with planned revisions to address the concerns raised.

read point-by-point responses

Referee: [Methods] Methods (account selection): The five accounts are labeled 'suspected' boosting-service provider accounts, yet the specific criteria, metadata patterns, or evidence used to identify them are described only at a high level. This is load-bearing because the full dataset, motif analysis, and citation-attribution claims rest on these accounts being the source of the farming activity rather than ordinary uploads.

Authors: We agree that additional detail on the account selection process would improve the transparency of the study. In the revised manuscript, we will expand the Methods section to provide more specific information on the metadata patterns and observable behaviors that led us to suspect these accounts of providing boosting services. We note that complete disclosure of all identification heuristics is balanced against the risk of enabling further gaming of the platform, but we will include sufficient detail to allow readers to understand the basis for our selection. This addresses the load-bearing nature of the identification while maintaining the integrity of the analysis. revision: yes
Referee: [Validation] Validation section: The separate citation network used to establish that equal-references motifs are rare in legitimate work is not described as matched on paper genre, methodological overlap, or narrow subfields (e.g., data-descriptor or protocol papers where reference-list reuse is common). Without such controls, the motif's specificity as a signal of coordination versus legitimate reuse is not fully demonstrated.

Authors: The referee correctly identifies a limitation in our validation approach. The control network was selected as a broad sample of legitimate scientific work from a different platform or corpus, but it was not explicitly matched for genre or subfield. In the revision, we will add a dedicated paragraph in the Validation section discussing this choice, including why we believe the motif remains rare even across varied fields, and we will explore the possibility of adding a more targeted control set if feasible with available data. We maintain that the stark contrast observed supports the motif as a strong signal, but we will qualify the interpretation accordingly. revision: partial
Referee: [Results] Results (disproportionate citation): The claim that papers in equal-references groups 'disproportionately cite a small set of authors' requires an explicit quantitative baseline (e.g., comparison against the degree distribution or a null model in the constructed network) to support the interpretation of coordinated boosting over independent scholarly practice.

Authors: We appreciate this suggestion for strengthening the quantitative support. In the revised Results section, we will include an explicit comparison of the citation patterns in equal-references groups against the overall degree distribution in the author-level citation network. Additionally, we will describe a simple null model (e.g., random citation assignment preserving the number of references) to demonstrate that the observed concentration on a small set of authors is statistically unlikely under independent practice. This will provide the requested baseline and bolster the interpretation of coordinated activity. revision: yes

Circularity Check

0 steps flagged

No circularity: direct network construction and motif counting from source data

full rationale

The paper constructs paper- and author-level citation networks directly from the ~3000 uploaded papers and metadata, defines equal-references groups as clusters sharing identical reference lists, counts their occurrence, and compares motif frequency against a separate control citation network. No equations, fitted parameters, or derivations are present that reduce a claimed result to the inputs by construction. The control network serves as external validation rather than a self-referential step. No self-citations are invoked as load-bearing premises, and the analysis does not rename or smuggle in prior results via ansatz. This is a standard empirical network study whose central claims rest on observable counts rather than definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim depends on the domain assumption that the five accounts are boosting providers and that identical reference lists are a reliable marker of coordination rather than other legitimate causes.

axioms (2)

domain assumption The five accounts are suspected boosting-service provider accounts
Paper begins analysis from these accounts without detailing selection criteria beyond 'suspected'.
domain assumption Equal reference lists indicate coordinated or automated boosting rather than independent scholarly practice
This is the interpretive step linking the observed motif to the claim of citation farming.

invented entities (1)

equal references groups no independent evidence
purpose: Structural signal for detecting coordinated citation boosting
New motif defined and measured in the paper; no independent evidence outside this study is provided.

pith-pipeline@v0.9.0 · 5428 in / 1396 out tokens · 26643 ms · 2026-05-10T11:48:19.907591+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 7 canonical work pages

[1]

Bea Aubert, R Barate, D Boutigny, F Couderc, Y Karyotakis, JP Lees, V Poireau, V Tisserand, A Zghiche, E Grauges, et al . 2007. Measurement of Branching Fractions and Mass Spectra of B→ K 𝜋 𝜋 𝛾 .Physical review letters98, 21 (2007), 211804

2007
[2]

Renata Avros, Saar Keshet, Dvora Toledano Kitai, Evgeny Vexler, and Zeev Volkovich. 2023. Detecting Pseudo-Manipulated Citations in Scientific Liter- ature through Perturbations of the Citation Graph.Mathematics11, 18 (2023),

2023
[3]

doi:10.3390/math11183820

work page doi:10.3390/math11183820
[4]

Johannes Gehrke, Paul Ginsparg, and Jon Kleinberg. 2003. Overview of the 2003 KDD Cup.Acm Sigkdd Explorations Newsletter5, 2 (2003), 149–151

2003
[5]

Hazem Ibrahim, Fengyuan Liu, Yasir Zaki, and Talal Rahwan. 2025. Citation manipulation through citation mills and pre-print servers. 15, 1 (2025), 5480. doi:10.1038/s41598-025-88709-7

work page doi:10.1038/s41598-025-88709-7 2025
[6]

Savina Kirilova and Fred Zoepfl. 2025. Metrics fraud on ResearchGate.Journal of Informetrics19, 1 (2025), 101604. doi:10.1016/j.joi.2024.101604

work page doi:10.1016/j.joi.2024.101604 2025
[7]

Weixin Liang, Yaohui Zhang, Zhengxuan Wu, Haley Lepp, Wenlong Ji, Xuandong Zhao, Hancheng Cao, Sheng Liu, Siyu He, Zhi Huang, Diyi Yang, Christopher Potts, Christopher D Manning, and James Y. Zou. 2024. Mapping the Increasing Use of LLMs in Scientific Papers. InFirst Conference on Language Modeling. https://openreview.net/forum?id=YX7QnhxESU

2024
[8]

Jiaying Liu, Feng Xia, Xu Feng, Jing Ren, and Huan Liu. 2022. Deep graph learning for anomalous citation detection.IEEE Transactions on Neural Networks and Learning Systems33, 6 (2022), 2543–2557. https://ieeexplore.ieee.org/abstract/ document/9709524

work page arXiv 2022
[9]

Lie, and Hans Lund

Birgitte Nørgaard, Karen E. Lie, and Hans Lund. 2026. Predictors of citation rates and the problem of citation bias: a scoping review. 190 (2026), 112057. doi:10.1016/j.jclinepi.2025.112057

work page doi:10.1016/j.jclinepi.2025.112057 2026
[10]

2026.Suspected Citation Boosting Network on ResearchGate [Data set]

Benedikt Wotka, Bennett Daniel, Cenk Erdoğan, Ashish Sai, and Adriana Iamnitchi. 2026.Suspected Citation Boosting Network on ResearchGate [Data set]. doi:10.5281/zenodo.19328245

work page doi:10.5281/zenodo.19328245 2026
[11]

Wren and Constantin Georgescu

Jonathan D. Wren and Constantin Georgescu. 2022. Detecting anomalous ref- erencing patterns in PubMed papers suggestive of author-centric reference list manipulation.Scientometrics127, 10 (2022), 5753–5771. doi:10.1007/s11192-022- 04503-6

work page doi:10.1007/s11192-022- 2022