arxiv: 2604.22458 · v1 · submitted 2026-04-24 · 💻 cs.DL

Recognition: unknown

Opening Pandora's box: Paper mills in conference proceedings

Anna Abalkina , Marie Kune\v{s}ov\'a , Yagmur Ozturk , Solal Pirelli

Authors on Pith no claims yet

Pith reviewed 2026-05-08 08:44 UTC · model grok-4.3

classification 💻 cs.DL

keywords paper millsconference proceedingsIEEEscientific misconductacademic integritytitle matchingsocial mediafraudulent publications

0 comments

The pith

Paper mills sold titles for 1,720 papers that appeared in 286 IEEE conference proceedings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that paper mills sell authorship on scientific work and that many of those papers end up published in conference proceedings. The authors gathered over 4,000 sales offers from social media, then matched the titles to actual papers in IEEE events using automated checks plus human review. They located 1,720 matching papers, some conferences containing as many as 23.51 percent of them, written by more than 6,500 people from over 3,500 institutions in 55 countries. The papers display odd patterns such as six-author clusters, unusually broad author lists, citation manipulation, and content problems. If the matches hold, organized fraud has moved beyond journals into a major part of the research output in computer science and engineering.

Core claim

We collected more than 4,000 unique publication offers posted on over 200 social media channels and matched their titles to papers published in IEEE conference proceedings through semi-automated methods and human assessment. This process identified 1,720 papers across 286 proceedings, representing up to 23.51 percent of the content in individual conferences. These papers were co-authored by more than 6,500 researchers from over 3,500 affiliations in 55 countries and exhibited collaboration anomalies, high numbers of affiliations per paper, citation manipulation, a predominance of six-author papers, and content-based irregularities.

What carries the argument

Title-based matching of social-media sales offers for papers to actual publications in IEEE conference proceedings, followed by human verification of the matches.

If this is right

Paper mills operate as a public, organized market that sells not only papers but also infiltrates multiple parts of the research ecosystem.
Conferences, which serve as primary publication venues in some fields, contain a measurable share of fraudulent content.
The identified papers involve researchers and institutions spread across 55 countries, indicating global reach.
Anomalies such as six-author predominance and citation manipulation accompany the title matches.
The scale reaches thousands of papers and authors, showing the problem is not isolated.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Organizers could run title checks against known sales lists before accepting submissions.
The method might be applied to journal articles to test whether the same infiltration exists there.
Some listed authors may have been added without full knowledge, raising questions about consent in collaborations.
Widespread use of these detection steps could reduce the incentive for mills by lowering the success rate of their sales.

Load-bearing premise

That matching paper titles from social media sales offers to published conference papers accurately identifies paper-mill products with a low rate of false positives caused by chance similarities.

What would settle it

A manual check of the full text and author records for a random sample of the 1,720 matched papers that finds no signs of misconduct or any link to the original sales posts.

read the original abstract

Paper mills are a growing threat to the integrity of science, yet their penetration in conference proceedings remains underexplored despite conferences being more important than journals in some scientific subfields. This study aims to identify papers in conference proceedings whose titles have been offered for sale on social media platforms. We collected more than 4,000 unique publication offers from more than 200 social media channels and used semi-automated methods along with human assessment to match offers with papers published in IEEE conference proceedings. We identified 1,720 papers in 286 IEEE conference proceedings, accounting for up to 23.51% of an individual conference. These problematic papers are co-authored by more than 6,500 researchers from over 3,500 affiliations in 55 countries. The identified papers demonstrate collaboration anomalies, high diversity of affiliations per paper, citation manipulation, a predominance of six-author papers, and content-based irregularities. Our findings show that paper mills are a large, organized, and often public market that commercializes scientific misconduct, not limited to papers, but infiltrating multiple parts of the research ecosystem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript collects more than 4,000 unique publication offers from over 200 social-media channels and applies semi-automated title matching followed by human assessment to identify matches with papers published in IEEE conference proceedings. It reports 1,720 matched papers appearing in 286 conferences (up to 23.51% of papers in an individual conference), involving more than 6,500 co-authors from over 3,500 affiliations across 55 countries, and documents associated anomalies including high affiliation diversity per paper, citation manipulation, a predominance of six-author papers, and content irregularities.

Significance. If the title-matching procedure is shown to have low false-positive rates, the work supplies the first large-scale observational evidence of organized paper-mill activity inside conference proceedings rather than journals. The reported scale, international reach, and multi-faceted anomalies would constitute a concrete, falsifiable data point for the scholarly-communication community and could directly inform detection policies at IEEE and similar venues.

major comments (2)

[Methods (title-matching procedure)] The central prevalence claims (1,720 papers, up to 23.51% of a conference) rest entirely on the accuracy of linking >4,000 social-media offers to published IEEE papers via titles. The Methods section describes only 'semi-automated methods along with human assessment' and supplies neither a quantified false-positive rate, the number of candidate matches discarded, inter-rater agreement statistics, nor a baseline comparison against randomly paired titles. Generic or formulaic titles common in engineering fields could generate spurious matches; without these diagnostics the downstream aggregates (6,500 authors, 3,500 affiliations) cannot be treated as reliable counts of paper-mill output.
[Results (anomaly analysis)] The Results section presents collaboration anomalies, citation manipulation, and content irregularities as evidence of paper-mill activity, yet provides no control sample of non-matched IEEE conference papers and no statistical test establishing that the observed patterns differ significantly from background rates in the same venues or subfields.

minor comments (2)

[Abstract and Methods] The abstract states 'more than 4,000 unique publication offers' but the exact collection window, channel sampling strategy, and deduplication criteria are not stated; these details are required for reproducibility.
[Results] Clarify whether the 23.51% figure represents the single highest conference or the upper tail of a distribution; a table or histogram showing the percentage of matched papers per conference would strengthen the claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review, which highlights important areas for strengthening the transparency and rigor of our analysis. We address each major comment below and will incorporate revisions to the manuscript.

read point-by-point responses

Referee: The central prevalence claims (1,720 papers, up to 23.51% of a conference) rest entirely on the accuracy of linking >4,000 social-media offers to published IEEE papers via titles. The Methods section describes only 'semi-automated methods along with human assessment' and supplies neither a quantified false-positive rate, the number of candidate matches discarded, inter-rater agreement statistics, nor a baseline comparison against randomly paired titles. Generic or formulaic titles common in engineering fields could generate spurious matches; without these diagnostics the downstream aggregates (6,500 authors, 3,500 affiliations) cannot be treated as reliable counts of paper-mill output.

Authors: We agree that the Methods section would benefit from expanded detail on the title-matching procedure to allow readers to assess reliability. The process began with automated normalization of titles (lowercasing, punctuation removal, and whitespace standardization) to generate candidate pairs between the collected offers and IEEE proceedings records, followed by human assessment in which all candidates were reviewed for exact title correspondence and contextual consistency with the original offer. We will revise the Methods to explicitly describe these steps, report the total number of candidates reviewed and the number discarded due to mismatches, and note that human review was performed with consensus resolution for any initial differences. A formal random baseline comparison was not included in the original submission; we will add one by applying the identical matching procedure to a sample of randomly selected IEEE paper titles to quantify the expected spurious match rate. We expect this rate to be low because the social-media offers frequently contain additional specific details (e.g., author counts or topical keywords) that were cross-checked during human review, reducing the chance of false positives from generic engineering titles. revision: partial
Referee: The Results section presents collaboration anomalies, citation manipulation, and content irregularities as evidence of paper-mill activity, yet provides no control sample of non-matched IEEE conference papers and no statistical test establishing that the observed patterns differ significantly from background rates in the same venues or subfields.

Authors: The anomaly analysis is presented as descriptive characterization of the matched papers to illustrate patterns consistent with organized paper-mill activity, rather than as standalone statistical proof. We acknowledge that the absence of a control group limits the ability to demonstrate that these patterns deviate significantly from typical IEEE conference papers. We will revise the Results section to include a control sample consisting of randomly selected non-matched papers from the same 286 conferences and time period. We will then report comparative statistics (e.g., distributions of author counts, affiliation diversity per paper, and citation patterns) and apply appropriate statistical tests such as chi-squared or Mann-Whitney U tests to establish whether the differences are significant. This addition will make the supporting evidence more robust while preserving the primary contribution of the title-matching results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; purely observational data collection and matching

full rationale

The paper performs an empirical study by collecting over 4,000 publication offers from social media, applying semi-automated title matching plus human assessment to link them with IEEE conference papers, and then tabulating counts (1,720 papers), author/affiliation aggregates, and observed anomalies such as collaboration patterns. No equations, fitted parameters, predictions, or first-principles derivations appear in the reported chain. The central prevalence figures are direct outputs of the matching procedure rather than quantities that reduce by construction to any self-citation, ansatz, or renamed input. Self-citations, if present, are not load-bearing for the counts or anomalies. The work is therefore self-contained as standard observational analysis without circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the accuracy of title matching and the interpretation of observed anomalies as paper-mill signatures. No free parameters are fitted; the study uses external social-media data and published conference records.

axioms (1)

domain assumption Titles advertised on social media for sale correspond to papers produced by paper mills and later published in IEEE proceedings.
Invoked in the matching procedure described in the abstract.

pith-pipeline@v0.9.0 · 5501 in / 1288 out tokens · 38950 ms · 2026-05-08T08:44:27.317879+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

7 extracted references · 6 canonical work pages

[1]

The person in power told me to

and citations. Journal of Clinical Epidemiology, 172, 111397.https://doi.org/10.1016/j.jclinepi.2024.111397 Christopher, J. (2021). The raw truth about paper mills. FEBS letters, 595(13), 1751-1757. https://doi.org/10.1002/1873-3468.14143 Clyde, S. (2025). The Citation Payola. For Better Science. https://forbetterscience.com/2025/07/08/the-citation-payola...

work page doi:10.1016/j.jclinepi.2024.111397 2024
[2]

M., Berens, P., & Kobak, D

https://doi.org/10.1371/journal.pone.0280018 González-Márquez, R., Schmidt, L., Schmidt, B. M., Berens, P., & Kobak, D. (2024). The landscape of biomedical research. Patterns, 5(6). https://doi.org/10.1016/j.patter.2024.100968 Guba, K. S. (2025). Beyond collaboration: examining co-authorship patterns in questionable journals. Scientometrics, 130(9), 5171-...

work page doi:10.1371/journal.pone.0280018 2024
[3]

https://doi.org/10.1038/s41598-025-88709-7 IEEE. (2024). Retraction notice: Enhancing Collaborative Intrusion detection networks against insider attack using supervised learning technique. 2022 IEEE 2nd Mysore Sub Section International Conference (MysuruCon). https://doi.org/10.1109/MysuruCon55714.2022.10478498 IEEE. (2025). Retraction Notice: The Novel E...

work page doi:10.1038/s41598-025-88709-7 2024
[4]

Joelving, F. (2023). Plague of anomalies in conference proceedings hint at ‘systemic issues’. Retraction Watch. URL: https://retractionwatch.com/2023/06/15/plague-of-anomalies-in-conference-proceedings-hint-at-systemic-issues/ Kersjes, T. (2025). Paper Mill Use of Fake Personas to Manipulate the Peer Review Process. International Congress on Peer Review a...

2023
[5]

Fake Essay

https://peerreviewcongress.org/abstract/paper-mill-use-of-fake-personas-to-manipulate-the-peer-review-process Kingori P. (2021). Kenya's "Fake Essay" Writers and the Light they Shine on Assumptions of Shadows in Knowledge Production. J Afr Cult Stud. 2021 Sep 20;33(3):297-304. https://doi.org/10.1080/13696815.2021.1952405 Kochetkov, D., Birukou, A., Ermol...

work page doi:10.1080/13696815.2021.1952405 2021
[6]

predatory

Springer, Cham. https://doi.org/10.1007/978-3-030-97110-6_28 Kulczycki, E., Hołowiecki, M., & Doğan, G. (2024). Questionable conferences and presenters from top-ranked universities. Journal of Information Science. https://doi.org/10.1177/01655515221087674 Labbé, C., Grima, N., Gautier, T., Favier, B., & Byrne, J. A. (2019). Semi-automated fact-checking of...

work page doi:10.1007/978-3-030-97110-6_28 2024
[7]

https://doi.org/10.1177/01655515251362383 Matusz, P

Journal of Information Science, 0(0). https://doi.org/10.1177/01655515251362383 Matusz, P. J., Abalkina, A., & Bishop, D. V. (2025). The Threat of Paper Mills to Biomedical and Social Science Journals: The Case of the Tanu. pro Paper Mill in Mind, Brain, and Education. Mind, Brain, and Education, 19(2), 90-100. https://doi.org/10.1111/mbe.12436 McCook, A....

work page doi:10.1177/01655515251362383 2025