pith. machine review for the scientific record. sign in

arxiv: 2605.11930 · v1 · submitted 2026-05-12 · 💻 cs.DL · cs.SI

Recognition: no theorem link

Citation Cliques in Low Impact Journals

Authors on Pith no claims yet

Pith reviewed 2026-05-13 04:09 UTC · model grok-4.3

classification 💻 cs.DL cs.SI
keywords citation networkslow-impact journalsauthor cohesionbibliometricscitation cliquesreciprocityEigenfactorcitation economies
0
0 comments X

The pith

Authors in low-impact journals cite each other at 6.7 times the rate and with 4.7 times the reciprocity of matched authors in high-impact venues.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper matches authors from low- and high-impact journals on subject area and h5-index, then compares their citation patterns using Crossref data. It finds markedly denser and more reciprocal author-to-author citations within the low-impact group, including cliques that form closed citation economies. A reader should care because these patterns can inflate bibliometric indicators and create a segregated citation landscape between high- and low-visibility venues.

Core claim

Using subject-normalized Eigenfactor percentiles to label venues and a 10 percent sample of 9,431 matched author pairs, the study shows low-impact authors exhibit 6.7 times higher co-author citation rates and 4.7 times higher reciprocity than high-impact controls. A hybrid detection pipeline isolates 277 outliers with 93.5 percent low-impact purity that display an 11-fold clique-strength increase, revealing a Two Worlds segregation where low-impact venues operate as inward-looking citation networks rather than participating in open exchange.

What carries the argument

Author matching by subject area and h5-index, followed by aggregate comparison of citation cohesion metrics (co-author citation rates and reciprocity) and a subject-aware outlier detection pipeline that identifies cliques and their hub-and-spoke topologies.

If this is right

  • Low-impact venues sustain segregated citation economies that inflate their own bibliometric scores.
  • Cohesion, rather than one-way asymmetry, is the dominant driver of the observed Case-Control gap.
  • Outlier cliques display directed flows from peripheral authors toward central beneficiaries, not equal exchange.
  • The Two Worlds pattern (correlation 0.71) implies citation-based ranking systems systematically misrepresent influence across impact strata.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Citation-based evaluation of researchers who publish mainly in low-impact venues may need separate normalization to avoid over- or under-counting due to local reciprocity.
  • The same matching-and-cohesion approach could be applied to conference proceedings or preprint servers to test whether the pattern generalizes beyond journals.
  • If the closed economies persist over time, they could widen the visibility gap between high- and low-impact communities even when underlying research quality is comparable.

Load-bearing premise

Matching authors solely by subject area and h5-index fully removes confounding differences between the low- and high-impact groups, so that observed cohesion gaps can be attributed to venue impact level.

What would settle it

A re-analysis that adds further author-level controls (such as career length, institutional prestige, or total publication count) and finds the cohesion and reciprocity differences disappear or reverse.

Figures

Figures reproduced from arXiv: 2605.11930 by Diomidis Spinellis, Grigorios Alexandrou, Panagiotis-Alexios Spanakis.

Figure 1
Figure 1. Figure 1: Analysis DAG for the social-network stage. Ellipses represent derived tables and arrows [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Co-author citation gap by subject field. Forest plot of mean paired difference ∆ = ¯rCase − r¯Control in co-author citation rate, with 95% confidence intervals. Marker • = field estimate; ♦ = overall. ∆ labels report rounded effect sizes. The dashed vertical line marks zero (no difference); the pale red shading covers the positive region where Case authors exceed Controls [PITH_FULL_IMAGE:figures/full_fig… view at source ↗
Figure 3
Figure 3. Figure 3: , where cohesion and concentration metrics rank as the most discriminative features for tier classification, while authority metrics such as eigenvector centrality play a secondary role. 0.00 0.05 0.10 0.15 0.20 0.25 0.30 Mean decrease in impurity Local Clustering Incoming Entropy Incoming HHI Norm. Triangles Clique Strength K-Core Number Reciprocity Rate Eigenvector Centr. Co-author Cit. Rate Self-Citatio… view at source ↗
Figure 4
Figure 4. Figure 4: Subject-Specific Effect Sizes (Cliff’s δ). Each cell reports the effect size for a given metric within a subject area. Positive values (white–pale red) indicate that Cases exhibit higher values than Controls. Negative values (blue) indicate that Controls exhibit higher values than Cases. Co-author Citation Rate shows near-zero or slightly positive values across all subjects, while velocity and burstiness a… view at source ↗
Figure 5
Figure 5. Figure 5: Tier separability: LDA projection. Kernel density estimates of the scores produced by projecting each author’s standardised feature vector onto the single Linear Discriminant Analysis (LDA) axis that maximally separates Case from Control authors. Dotted vertical lines mark within-tier medians. Negative scores correspond to the cohesion-driven regime (higher co-author citation, clustering, and reciprocity),… view at source ↗
Figure 6
Figure 6. Figure 6: Outlier behavioural fingerprint. Radar chart of outlier fold-change ratios (log-scaled radial axis) relative to non-outlier authors across six citation-behaviour metrics. Values on each spoke give the outlier-to-normal mean ratio; concentric rings mark 1×, 10×, and 100× baselines. The dashed inner ring corresponds to 1× (no difference). 17 [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Largest outlier citation syndicate (n = 23). Force-directed layout of the internal directed citation network. Node size is proportional to betweenness centrality; salmon = hub (highest betweenness); pink = net giver (out-degree > in-degree); cyan = net receiver. Arrows indicate the direction of citation flow; edge width scales with the number of citations between two authors. The inset box reports node cou… view at source ↗
Figure 8
Figure 8. Figure 8: Citation mixing matrix (r = 0.71, Q = 0.97; strong homophily). Row-normalised proportion of citations directed within and across tiers. Cell values show the conditional probability that an author in the row tier cites an author in the column tier. The Brewer RdBu colormap is centred at 0.5. Below: diagonal average (84.0%) summarises overall within-tier preference. 4.7 Cross-referencing Case Authors To move… view at source ↗
read the original abstract

This exploratory study examines how low-impact journals, defined through subject-normalized Eigenfactor percentiles, are associated with denser and more reciprocating patterns of author-to-author citations. Using Crossref records, we assign journals to broad subject areas, compute subject-specific Eigenfactor scores, propagate venue quality to works and authors, match authors in low- (Case) versus high-influence (Control) venues by subject and h5, and analyze citation edges for cohesion and anomalies. Across a 10% sample of 9,431 matched pairs, authors in low-impact venues exhibit significantly higher cohesion: 6.7x higher co-author citation rates and 4.7x higher reciprocity in the aggregate Case-Control comparison. A subject-aware hybrid detection pipeline flags 277 outliers with 93.5% Case purity; these outliers display an 11x clique-strength lift relative to non-outliers, revealing a stark "Two Worlds" segregation (r = 0.71) where low-impact venues operate as closed citation economies. The largest detected component (n = 23) displays a hub-and-spoke topology in which peripheral "Sycophants" funnel citations to central "Beneficiaries" through coordinated bursts, confirming a directed flow imbalance rather than reciprocal exchange among equals. Overall, cohesion, rather than broad asymmetry, accounts for the main Case-Control differences, suggesting that low-impact venues foster segregated, inward-looking citation economies that distort bibliometric indicators.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. This exploratory study uses Crossref data to examine citation cohesion in low-impact journals (defined via subject-normalized Eigenfactor percentiles). Authors from low-impact (Case) and high-impact (Control) venues are matched by subject area and h5-index; across a 10% sample of 9,431 pairs, the analysis reports 6.7x higher co-author citation rates and 4.7x higher reciprocity in the Case group. A subject-aware hybrid detection pipeline identifies 277 outliers (93.5% Case purity) with 11x higher clique strength, revealing a 'Two Worlds' segregation (r=0.71) and a hub-and-spoke topology in the largest component (n=23) involving 'Sycophants' and 'Beneficiaries'. The central claim is that low-impact venues operate as closed, inward-looking citation economies that distort bibliometric indicators.

Significance. If the quantitative differences hold after addressing confounding and validation gaps, the work would demonstrate that venue impact level correlates with citation segregation at scale, with implications for the reliability of metrics like Eigenfactor. Strengths include the large matched sample (9,431 pairs), concrete multipliers, and the hybrid outlier pipeline that achieves high reported purity; these provide falsifiable, data-driven claims rather than purely theoretical assertions.

major comments (2)
  1. [Abstract] Abstract (author matching procedure): Matching solely by subject area and h5-index does not control for publication volume or mean collaboration/team size. Authors with identical h5-index can differ substantially in total output, creating more opportunities for internal citations within the Case group independent of venue segregation. This is load-bearing for the central claim, as the 6.7x co-author citation rate and 4.7x reciprocity differences cannot be isolated to venue impact level without balancing these factors.
  2. [Abstract] Abstract (results and pipeline description): The reported multipliers (6.7x, 4.7x), 93.5% Case purity, and r=0.71 correlation lack error bars, confidence intervals, exact statistical tests, or p-values. The hybrid detection pipeline's validation (false-positive rates, ground-truth comparison) is not described, leaving the outlier identification and 'Two Worlds' segregation claim only moderately supported despite the large sample.
minor comments (1)
  1. [Abstract] The terms 'Sycophants' and 'Beneficiaries' are introduced in the abstract without operational definitions or explicit criteria for assignment in the hub-and-spoke component; this reduces clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on our exploratory study. We address each major comment in detail below, providing our responses and indicating the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract (author matching procedure): Matching solely by subject area and h5-index does not control for publication volume or mean collaboration/team size. Authors with identical h5-index can differ substantially in total output, creating more opportunities for internal citations within the Case group independent of venue segregation. This is load-bearing for the central claim, as the 6.7x co-author citation rate and 4.7x reciprocity differences cannot be isolated to venue impact level without balancing these factors.

    Authors: We recognize that additional controls for publication volume and team size could further isolate the effect of venue impact. Although h5-index provides a partial proxy for author productivity, we agree this is a valid concern for the central claim. In the revised version, we will augment the author matching procedure to also match on total publication count and average collaboration size. We will then recompute the co-author citation rates and reciprocity metrics under this stricter matching and report any changes to the 6.7x and 4.7x multipliers. revision: yes

  2. Referee: [Abstract] Abstract (results and pipeline description): The reported multipliers (6.7x, 4.7x), 93.5% Case purity, and r=0.71 correlation lack error bars, confidence intervals, exact statistical tests, or p-values. The hybrid detection pipeline's validation (false-positive rates, ground-truth comparison) is not described, leaving the outlier identification and 'Two Worlds' segregation claim only moderately supported despite the large sample.

    Authors: We agree that the abstract and associated claims would benefit from explicit statistical support and pipeline validation details. In the revised manuscript, we will include bootstrap confidence intervals for the reported multipliers and the correlation coefficient, along with the results of permutation tests for significance. Additionally, we will expand the methods section to describe the validation of the hybrid detection pipeline, including false-positive rates from ground-truth comparisons and the cross-validation procedure used to obtain the 93.5% purity figure. The abstract will be updated to reference these statistical measures. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper performs an observational empirical analysis on external Crossref citation data, using standard subject-normalized Eigenfactor percentiles to classify low- versus high-impact venues and then computing separate cohesion metrics (co-author citation rates, reciprocity) on the same graph. No load-bearing step reduces a claimed result to a tautology, fitted parameter, or self-citation chain by construction. The matching by subject and h5-index, outlier detection, and aggregate Case-Control comparisons are direct data-driven procedures without self-definitional loops or renamed known results. The analysis is self-contained against external benchmarks and does not invoke any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 1 invented entities

The central claims rest on the assumption that Eigenfactor-based classification and h5/subject matching isolate venue effects, plus post-hoc labeling of network roles; no independent verification of these steps is provided in the abstract.

free parameters (2)
  • low-impact threshold
    Subject-normalized Eigenfactor percentiles used to assign journals to Case group
  • sample fraction
    10% sample of 9,431 matched pairs selected for analysis
axioms (2)
  • domain assumption Eigenfactor scores and h5-index provide valid proxies for journal influence and author productivity respectively
    Used to define Case/Control groups and perform matching
  • ad hoc to paper The hybrid detection pipeline accurately identifies citation cliques without excessive false positives
    Invoked to flag 277 outliers with reported 93.5% Case purity
invented entities (1)
  • Sycophants and Beneficiaries no independent evidence
    purpose: Descriptive labels for peripheral and central nodes in the largest detected component
    Assigned to describe hub-and-spoke topology; no independent evidence provided

pith-pipeline@v0.9.0 · 5564 in / 1446 out tokens · 97236 ms · 2026-05-13T04:09:00.228433+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages

  1. [1]

    Gaming the Metrics: Misconduct and Manipulation in Academic Research , year =

  2. [2]

    2025 , note =

    Shumin Qiu and Claudia Steinwender and Pierre Azoulay , title =. 2025 , note =

  3. [3]

    Journal of the Association for Information Science and Technology , year =

    Lonni Besançon and Guillaume Cabanac and others , title =. Journal of the Association for Information Science and Technology , year =

  4. [4]

    Iakovos Evdaimon and John P. A. Ioannidis and others , title =. arXiv Cornell University , year =

  5. [5]

    Stamp out paper mills , journal =

    Anna Abalkina and Ren. Stamp out paper mills , journal =. 2025 , volume =

  6. [6]

    Bergstrom and Jevin D

    Carl T. Bergstrom and Jevin D. West and Marc A. Wiseman , title =. Journal of Neuroscience , year =

  7. [7]

    PLoS ONE , year =

    Spinellis, Diomidis , title =. PLoS ONE , year =

  8. [8]

    IEEE Software , volume =

    Diomidis Spinellis , title =. IEEE Software , volume =. 2024 , doi =

  9. [9]

    Biometrics Bulletin , year =

    Frank Wilcoxon , title =. Biometrics Bulletin , year =

  10. [10]

    Journal of the Royal Statistical Society: Series B , year =

    Yoav Benjamini and Yosef Hochberg , title =. Journal of the Royal Statistical Society: Series B , year =

  11. [11]

    Tibshirani , title =

    Bradley Efron and Robert J. Tibshirani , title =

  12. [12]

    Jacob Cohen , title =

  13. [13]

    Norman Cliff , title =

  14. [14]

    Proceedings of the 8th IEEE International Conference on Data Mining , year =

    Fei Tony Liu and Kai Ming Ting and Zhi-Hua Zhou , title =. Proceedings of the 8th IEEE International Conference on Data Mining , year =

  15. [15]

    Blondel and Jean-Loup Guillaume and Renaud Lambiotte and Etienne Lefebvre , title =

    Vincent D. Blondel and Jean-Loup Guillaume and Renaud Lambiotte and Etienne Lefebvre , title =. Journal of Statistical Mechanics: Theory and Experiment , year =

  16. [16]

    Nature , year =

    Miryam Naddaf , title =. Nature , year =

  17. [17]

    Artificial Intelligence Review , volume =

    Anomalous citations detection in academic networks , author =. Artificial Intelligence Review , volume =. 2024 , doi =

  18. [18]

    Mark E. J. Newman , title =. Physical Review Letters , year =

  19. [19]

    Breunig and Hans-Peter Kriegel and Raymond T

    Markus M. Breunig and Hans-Peter Kriegel and Raymond T. Ng and J. Proceedings of the 2000. 2000 , pages =

  20. [20]

    Nature , year =

    Jeffrey Beall , title =. Nature , year =

  21. [21]

    Mark E. J. Newman and Michelle Girvan , title =. Physical Review E , year =

  22. [22]

    Watts and Steven H

    Duncan J. Watts and Steven H. Strogatz , title =. Nature , year =

  23. [23]

    Science , year =

    Eugene Garfield , title =. Science , year =

  24. [24]

    2026 , publisher =

    Spanakis, Panagiotis-Alexios and Alexandrou, Grigorios and Spinellis, Diomidis , title =. 2026 , publisher =. doi:10.5281/zenodo.19786937 , url =

  25. [25]

    PLoS ONE , year =

    Heneberg, Petr , title =. PLoS ONE , year =

  26. [26]

    Hirsch , title =

    Jorge E. Hirsch , title =. Proceedings of the National Academy of Sciences , year =

  27. [27]

    Burnham , title =

    Judith F. Burnham , title =. Biomedical Digital Libraries , year =

  28. [28]

    Goodhart, C. A. E. , year =. Problems of Monetary Management: The. doi:10.1007/978-1-349-17295-5_4 , booktitle =

  29. [29]

    False Authorship: An Explorative Case Study Around an

    Spinellis, Diomidis , year =. False Authorship: An Explorative Case Study Around an. Research Integrity and Peer Review , publisher =. doi:10.1186/s41073-025-00165-z , number =

  30. [30]

    Engineering Data Processing Workflows , year =

    Diomidis Spinellis , journal =. Engineering Data Processing Workflows , year =. doi:10.1109/MS.2024.3385665 , url =

  31. [31]

    Ioannidis, John P. A. and Collins, Thomas A. and Baas, Jeroen , title =. Scientometrics , year =

  32. [32]

    2021 , volume =

    Christopher, Jana , title =. 2021 , volume =

  33. [33]

    Ioannidis, John P. A. , title =. 2025 , volume =

  34. [34]

    Toward the Discovery of Citation Cartels in Citation Networks , journal =

    Iztok. Toward the Discovery of Citation Cartels in Citation Networks , journal =. 2016 , volume =

  35. [35]

    Citation gaming induced by bibliometric evaluation:

    Alberto Baccini and Giuseppe. Citation gaming induced by bibliometric evaluation:. 2019 , volume =

  36. [36]

    Guerrero-Bote and Félix Moya-Aneg

    Borja González-Pereira and Vicente P. Guerrero-Bote and Félix Moya-Aneg. A new approach to the metric of journals' scientific prestige:. Journal of Informetrics , year =

  37. [37]

    Bibliometrics:

    Diana Hicks and Paul Wouters and Ludo Waltman and Sarah. Bibliometrics:. Nature , year =

  38. [38]

    Hundreds of extreme self-citing scientists revealed in new database , journal =

    Richard. Hundreds of extreme self-citing scientists revealed in new database , journal =. 2019 , volume =

  39. [39]

    Wilhite and Eric A

    Alan W. Wilhite and Eric A. Fong , title =. Science , year =

  40. [40]

    Scientific Reports , year =

    Sadamori Kojaku and Giacomo Livan and Naoki Masuda , title =. Scientific Reports , year =

  41. [41]

    Science , year =

    Michele Catanzaro , title =. Science , year =

  42. [42]

    Martin and John P

    Mario Biagioli and Martin Kenney and Ben R. Martin and John P. Walsh , title =. Research Policy , year =

  43. [43]

    Journal of the American Medical Association , volume=

    The History and Meaning of the Journal Impact Factor , author=. Journal of the American Medical Association , volume=. 2006 , publisher=

  44. [44]

    Journal of Informetrics , year =

    Antonio Perianes-Rodriguez and Ludo Waltman and Nees Jan van Eck , title =. Journal of Informetrics , year =

  45. [45]

    John P. A. Ioannidis and Jeroen Baas and Richard Klavans and Kevin W. Boyack , title =. 2019 , volume =

  46. [46]

    1999 , number =

    Page, Lawrence and Brin, Sergey and Motwani, Rajeev and Winograd, Terry , title =. 1999 , number =

  47. [47]

    , title =

    Hirschman, Albert O. , title =. The American Economic Review , year =

  48. [48]

    , title =

    Shannon, Claude E. , title =. The Bell System Technical Journal , year =

  49. [49]

    , title =

    Seidman, Stephen B. , title =. Social Networks , year =

  50. [50]

    Data Mining and Knowledge Discovery , year =

    Kleinberg, Jon , title =. Data Mining and Knowledge Discovery , year =

  51. [51]

    , title =

    Aksnes, Dag W. , title =. Scientometrics , year =

  52. [52]

    American Journal of Sociology , volume =

    Bonacich, Phillip , title =. American Journal of Sociology , volume =. 1987 , doi =