Assessing the Applicability of Authorship Verification Methods
Pith reviewed 2026-05-25 17:33 UTC · model grok-4.3
The pith
Authorship verification methods handle short informal chats and time-separated documents but all fail on cross-topic cases.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By proposing explicit criteria and properties to characterize AV approaches and then training, optimizing and evaluating twelve existing methods on three self-compiled corpora that each target a distinct aspect of forensic applicability, the paper shows that part of the methods succeed on very challenging cases such as 250-character informal chat conversations at 72.7 percent accuracy and scientific documents written an average of 15.6 years apart at over 75 percent accuracy, while establishing that all methods are prone to failure in cross-topic verification cases.
What carries the argument
Three self-compiled corpora, each constructed to isolate one forensic applicability factor (short informal length, temporal separation, and topic variation), used to train optimize and test twelve AV methods.
If this is right
- Methods that reach 72.7 percent on 250-character chats become candidates for forensic analysis of short informal messaging logs.
- Methods exceeding 75 percent on documents separated by 15.6 years on average can be considered for cases involving writing produced years apart.
- The universal weakness on cross-topic cases implies that any deployable AV system must incorporate safeguards or additional features that reduce topic dependence.
- The proposed characterization criteria supply a repeatable basis for comparing new AV methods against the same forensic dimensions.
- Forensic use of AV requires matching the chosen method to the expected document characteristics rather than treating all methods as interchangeable.
Where Pith is reading between the lines
- If cross-topic failure is the dominant limitation, pairing AV methods with separate topic classifiers could improve reliability in mixed-topic investigations.
- The same three-corpus testing design could be applied to non-English texts or additional genres to map method applicability more broadly.
- The reported accuracies suggest AV outputs would function best as one piece of supporting evidence rather than decisive proof in legal settings.
- Practitioners could adopt the paper's criteria as a checklist when selecting or developing AV tools for specific case types.
Load-bearing premise
The three self-compiled corpora accurately capture the distributions of document length, temporal separation, and topic variation that occur in real forensic verification tasks.
What would settle it
A new test showing that one or more of the twelve methods reaches high accuracy on cross-topic pairs drawn from the same three corpora, or external data showing that the corpora diverge substantially from actual forensic case distributions.
Figures
read the original abstract
Authorship verification (AV) is a research subject in the field of digital text forensics that concerns itself with the question, whether two documents have been written by the same person. During the past two decades, an increasing number of proposed AV approaches can be observed. However, a closer look at the respective studies reveals that the underlying characteristics of these methods are rarely addressed, which raises doubts regarding their applicability in real forensic settings. The objective of this paper is to fill this gap by proposing clear criteria and properties that aim to improve the characterization of existing and future AV approaches. Based on these properties, we conduct three experiments using 12 existing AV approaches, including the current state of the art. The examined methods were trained, optimized and evaluated on three self-compiled corpora, where each corpus focuses on a different aspect of applicability. Our results indicate that part of the methods are able to cope with very challenging verification cases such as 250 characters long informal chat conversations (72.7% accuracy) or cases in which two scientific documents were written at different times with an average difference of 15.6 years (> 75% accuracy). However, we also identified that all involved methods are prone to cross-topic verification cases.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes criteria and properties for characterizing authorship verification (AV) methods and evaluates 12 existing approaches (including state-of-the-art) on three self-compiled corpora, each designed to target a distinct aspect of applicability in forensic settings. It reports that certain methods achieve 72.7% accuracy on 250-character informal chat conversations and >75% accuracy on scientific documents separated by an average of 15.6 years, while all methods fail on cross-topic cases.
Significance. If the self-compiled corpora faithfully represent real forensic distributions, the results would usefully demonstrate that some AV methods can handle extreme length and temporal constraints while exposing a shared vulnerability to topic shifts, thereby guiding method selection and future development in digital text forensics.
major comments (2)
- [Corpus construction (Section 3)] Corpus construction (Section 3): the three self-compiled corpora are load-bearing for the applicability conclusions, yet no quantitative anchoring is supplied (topic entropy, length histograms, or temporal-gap statistics) against established forensic collections; without this, the reported accuracies may reflect residual topic leakage rather than authorship signal.
- [Results (Section 4 and abstract)] Results (Section 4 and abstract): the specific accuracy figures (72.7% on the chat corpus, >75% on the temporal corpus) are presented without statistical significance tests, confidence intervals, or baseline comparisons, leaving the empirical support for the central claims only moderately robust.
minor comments (1)
- [Properties and criteria] The exact operational definition of 'cross-topic' pairs should be stated more explicitly when the properties are introduced, to ensure reproducibility of the failure case.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and indicate planned revisions.
read point-by-point responses
-
Referee: Corpus construction (Section 3): the three self-compiled corpora are load-bearing for the applicability conclusions, yet no quantitative anchoring is supplied (topic entropy, length histograms, or temporal-gap statistics) against established forensic collections; without this, the reported accuracies may reflect residual topic leakage rather than authorship signal.
Authors: Our three corpora were constructed to isolate distinct forensic challenges (extreme brevity in informal text, long temporal gaps in scientific writing, and explicit cross-topic shifts) that are not simultaneously represented in standard collections such as PAN. The cross-topic corpus enforces topic separation by design, and the uniform failure of all 12 methods on it indicates that topic leakage does not explain the results on the other two corpora. We nevertheless agree that descriptive statistics for our own data would improve transparency; the revised manuscript will add length histograms and temporal-gap distributions to Section 3. Full quantitative anchoring (e.g., topic entropy) against external forensic datasets is not feasible without re-collecting or re-annotating those datasets under our criteria, but we will add a discussion of this limitation. revision: partial
-
Referee: Results (Section 4 and abstract): the specific accuracy figures (72.7% on the chat corpus, >75% on the temporal corpus) are presented without statistical significance tests, confidence intervals, or baseline comparisons, leaving the empirical support for the central claims only moderately robust.
Authors: We accept that the reported accuracies would be more robust with formal statistical support. The revised Section 4 will include bootstrap-derived 95% confidence intervals for all accuracy figures and pairwise significance tests (McNemar’s test) between methods. The evaluation already compares 12 methods that span simple baselines to the state of the art; we will additionally report an explicit random-guess baseline and a majority-class baseline to make this comparison explicit. revision: yes
Circularity Check
No circularity; empirical evaluation on held-out corpora
full rationale
The paper performs direct empirical measurement: 12 AV methods are trained, optimized and evaluated on three self-compiled corpora, with reported accuracies (e.g., 72.7 % on short chats, >75 % on temporal gaps) obtained from held-out test data. No equations, derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the chain. The central claims rest on observable performance numbers rather than any reduction to inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The proposed criteria and properties suffice to characterize the applicability of AV methods in real forensic settings
Reference graph
Works this paper leans on
-
[1]
Hosein Azarbonyad, Mostafa Dehghani, Maarten Marx, and Jaap Kamps. 2015. Time-Aware Authorship Attribution for Short Text Streams. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’15). ACM, New York, NY, USA, 727–730
work page 2015
-
[2]
Douglas Bagnall. 2015. Author Identification Using Multi-headed Recurrent Neural Networks. In Working Notes of CLEF 2015 - Conference and Labs of the Evaluation forum, Toulouse, France, September 8-11, 2015
work page 2015
-
[3]
Janek Bevendorff, Benno Stein, Matthias Hagen, and Martin Potthast. 2019. Gen- eralizing Unmasking for Short Texts. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Hu- man Language Technologies, Volume 1 (Long and Short Papers) . Association for Computational Linguistics, Minneapolis, M...
work page 2019
-
[4]
Kenneth A. Bollen. 1989. Structural Equations with Latent Variables . Wiley
work page 1989
-
[5]
Mohamed Amine Boukhaled and Jean-Gabriel Ganascia. 2014. Probabilistic Anomaly Detection Method for Authorship Verification . Springer International Publishing, Cham, 211–219
work page 2014
-
[6]
Daniel Castro Castro, Yaritza Adame Arcia, María Pelaez Brioso, and Rafael Muñoz Guillena. 2015. Authorship Verification, Average Similarity Analysis. In Proceedings of the International Conference Recent Advances in Natural Language Processing. INCOMA Ltd. Shoumen, BULGARIA, 84–90
work page 2015
-
[7]
Tommi Gröndahl and N. Asokan. 2019. Text Analysis in Adversarial Settings: Does Deception Leave a Stylistic Trace? CoRR abs/1902.08939 (2019). arXiv:1902.08939
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[8]
Oren Halvani, Lukas Graner, and Inna Vogel. 2018. Authorship Verification in the Absence of Explicit Features and Thresholds. InAdvances in Information Retrieval, Gabriella Pasi, Benjamin Piwowarski, Leif Azzopardi, and Allan Hanbury (Eds.). Springer International Publishing, 454–465
work page 2018
-
[9]
Oren Halvani and Martin Steinebach. 2014. An Efficient Intrinsic Authorship Verification Scheme Based on Ensemble Learning. In Ninth International Con- ference on A vailability, Reliability and Security, ARES 2014, Fribourg, Switzerland, September 8-12, 2014. Washington, DC, USA, 571–578
work page 2014
-
[10]
Oren Halvani, Christian Winter, and Lukas Graner. 2017. On the Usefulness of Compression Models for Authorship Verification. In Proceedings of the 12th International Conference on A vailability, Reliability and Security (ARES ’17). ACM, New York, NY, USA, Article 54, 10 pages
work page 2017
-
[11]
Oren Halvani, Christian Winter, and Anika Pflug. 2016. Authorship Verification for Different Languages, Genres and Topics. Digit. Investig. 16, S (March 2016), S33–S43
work page 2016
-
[12]
Josué Gerardo Gutiérrez Hernández, José Casillas, Paola Ledesma, Gibran Fuentes Pineda, and Iván Vladimir Meza Ruíz. 2015. Homotopy Based Classification for Author Verification Task: Notebook for PAN at CLEF 2015. In Working Notes of CLEF 2015 - Conference and Labs of the Evaluation forum, Toulouse, France, September 8-11, 2015. ARES ’19, August 26–29, 20...
work page 2015
-
[13]
Ángel Hernández-Castañeda and Hiram Calvo. 2017. Author Verification Using a Semantic Space Model. Computación y Sistemas 21, 2 (2017)
work page 2017
-
[14]
David I. Holmes. 1998. The Evolution of Stylometry in Humanities Scholarship. Literary and Linguistic Computing 13, 3 (1998), 111–117
work page 1998
-
[15]
Manuela Hürlimann, Benno Weck, Esther von den Berg, Simon Šuster, and Malvina Nissim. 2015. GLAD: Groningen Lightweight Authorship Detection. In Working Notes of CLEF 2015 – Conference and Labs of the Evaluation forum, Toulouse, France, September 8–11, 2015 . 12
work page 2015
-
[16]
Magdalena Jankowska, Vlado Keselj, and Evangelos E. Milios. 2013. Proximity Based One-class Classification with Common N-Gram Dissimilarity for Author- ship Verification Task Notebook for PAN at CLEF 2013. In Working Notes for CLEF 2013 Conference , Valencia, Spain, September 23-26, 2013
work page 2013
-
[17]
Magdalena Jankowska, Evangelos E. Milios, and Vlado Keselj. 2014. Author Verification Using Common N-Gram Profiles of Text Documents. In COLING 2014, 25th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, August 23-29, 2014, Dublin, Ireland , Jan Hajic and Junichi Tsujii (Eds.). ACL, 387–397
work page 2014
-
[18]
John Noecker Jr and Michael Ryan. 2012. Distractorless Authorship Verifica- tion. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12) (23-25), Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mari- ani, Asuncion Moreno, Jan Odijk, and Stelios...
work page 2012
-
[19]
Patrick Juola and Efstathios Stamatatos. 2013. Overview of the Author Identifi- cation Task at PAN 2013. In Working Notes for CLEF 2013 Conference, Valencia, Spain, September 23-26, 2013 . 20
work page 2013
-
[20]
Mahmoud Khonji and Youssef Iraqi. 2014. A Slightly-Modified GI-Based Author- Verifier with Lots of Features (ASGALF). In Working Notes for CLEF 2014 Confer- ence, Sheffield, UK, September 15-18, 2014. 977–983
work page 2014
-
[21]
Mirco Kocher and Jacques Savoy. 2015. UniNE at CLEF 2015 Author Identification: Notebook for PAN at CLEF 2015. In CLEF (Working Notes) (CEUR Workshop Proceedings), Vol. 1391. CEUR-WS.org
work page 2015
-
[22]
Moshe Koppel and Jonathan Schler. 2004. Authorship Verification as a One- Class Classification Problem. In Machine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada, July 4-8, 2004 (ACM International Conference Proceeding Series) , Carla E. Brodley (Ed.), Vol. 69. ACM
work page 2004
-
[23]
Moshe Koppel and Yaron Winter. 2014. Determining if Two Documents are Written by the Same Author. JASIST 65, 1 (2014), 178–187
work page 2014
-
[24]
Neal, Kalaivani Sundararajan, and Damon L
Tempestt J. Neal, Kalaivani Sundararajan, and Damon L. Woodard. 2018. Exploit- ing Linguistic Style as a Cognitive Biometric for Continuous Verification. In2018 International Conference on Biometrics, ICB 2018, Gold Coast, Australia, February 20-23, 2018. IEEE, 270–276
work page 2018
-
[25]
J. Olsson. 2008. Forensic Linguistics: Second Edition: An Introduction To Language, Crime and the Law . Bloomsbury Academic
work page 2008
-
[26]
Nektaria Potha and Efstathios Stamatatos. 2014. A Profile-Based Method for Authorship Verification. In Artificial Intelligence: Methods and Applications: 8th Hellenic Conference on AI, SETN 2014, Ioannina, Greece, May 15–17, 2014. Proceed- ings. Springer International Publishing, 313–326
work page 2014
-
[27]
Nektaria Potha and Efstathios Stamatatos. 2017. An Improved Impostors Method for Authorship Verification. InExperimental IR Meets Multilinguality, Multimodal- ity, and Interaction - 8th International Conference of the CLEF Association, CLEF 2017, Dublin, Ireland, September 11-14, 2017, Proceedings . 138–144
work page 2017
-
[28]
Nektaria Potha and Efstathios Stamatatos. 2018. Intrinsic Author Verification Using Topic Modeling. In Proceedings of the 10th Hellenic Conference on Artificial Intelligence, SETN 2018, Patras, Greece, July 09-12, 2018 . ACM, 20:1–20:7
work page 2018
-
[29]
Martin Potthast, Matthias Hagen, and Benno Stein. 2016. Author Obfuscation: Attacking the State of the Art in Authorship Verification. InWorking Notes Papers of the CLEF 2016 Evaluation Labs (CEUR Workshop Proceedings) , Vol. 1609. CLEF and CEUR-WS.org, 716–749
work page 2016
-
[30]
Martin Potthast, Paolo Rosso, Efstathios Stamatatos, and Benno Stein. 2019. A Decade of Shared Tasks in Digital Text Forensics at PAN. In Advances in Information Retrieval, Leif Azzopardi, Benno Stein, Norbert Fuhr, Philipp Mayr, Claudia Hauff, and Djoerd Hiemstra (Eds.). Springer International Publishing, Cham, 291–300
work page 2019
-
[31]
Rodionova, Paolo Oliveri, and Alexey L
Oxana Ye. Rodionova, Paolo Oliveri, and Alexey L. Pomerantsev. 2016. Rigor- ous and Compliant Approaches to One-Class Classification. Chemometrics and Intelligent Laboratory Systems 159 (2016), 89 – 96
work page 2016
-
[32]
Conrad Sanderson and Simon Guenter. 2006. Short Text Authorship Attribution via Sequence Kernels, Markov Chains and Author Unmasking: An Investigation. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP ’06). Association for Computational Linguistics, Stroudsburg, PA, USA, 482–491
work page 2006
-
[33]
Shachar Seidman. 2013. Authorship Verification Using the Impostors Method Notebook for PAN at CLEF 2013. In Working Notes for CLEF 2013 Conference , Valencia, Spain, September 23-26, 2013
work page 2013
-
[34]
Efstathios Stamatatos. 2009. A Survey of Modern Authorship Attribution Methods. J. Am. Soc. Inf. Sci. Technol. 60, 3 (March 2009), 538–556
work page 2009
-
[35]
Efstathios Stamatatos. 2013. On the Robustness of Authorship Attribution Based on Character N-Gram Features. Journal of Law and Policy 21 (01 2013), 421–439
work page 2013
-
[36]
Efstathios Stamatatos. 2017. Authorship Attribution Using Text Distortion. In Proceedings of the 15th Conference of the European Chapter of the Association for the Computational Linguistics, EACL 2017, April 3-7, 2017, Valencia, Spain . The Association for Computer Linguistics
work page 2017
-
[37]
Efstathios Stamatatos, Walter Daelemans, Ben Verhoeven, Patrick Juola, Aurelio López-López, Martin Potthast, and Benno Stein. 2015. Overview of the Author Identification Task at PAN 2015. In Working Notes of CLEF 2015 – Conference and Labs of the Evaluation forum, Toulouse, France, September 8–11, 2015 . 17
work page 2015
-
[38]
Sánchez-Pérez, and Alberto Barrón-Cedeño
Efstathios Stamatatos, Walter Daelemans, Ben Verhoeven, Benno Stein, Martin Potthast, Patrick Juola, Miguel A. Sánchez-Pérez, and Alberto Barrón-Cedeño
-
[39]
In Working Notes for CLEF 2014 Conference, Sheffield, UK, September 15–18, 2014
Overview of the Author Identification Task at PAN 2014. In Working Notes for CLEF 2014 Conference, Sheffield, UK, September 15–18, 2014 . 877–897
work page 2014
-
[40]
Efstathios Stamatatos, Nikos Fakotakis, and George K. Kokkinakis. 2000. Au- tomatic Text Categorization in Terms of Genre and Author. Computational Linguistics 26, 4 (2000), 471–495
work page 2000
-
[41]
Benno Stein, Nedim Lipka, and Sven Meyer zu Eissen. 2008. Meta Analysis within Authorship Verification. In 19th International Workshop on Database and Expert Systems Applications (DEXA 2008), 1-5 September 2008, Turin, Italy . IEEE Computer Society, 34–39
work page 2008
-
[42]
David Martinus Johannes Tax. 2001. One-Class Classification: Concept Learning In the Absence of Counter-Examples . Ph.D. Dissertation. Delft University of Technology
work page 2001
-
[43]
Cor J. Veenman and Zhenshi Li. 2013. Authorship Verification with Compression Features. In Working Notes for CLEF 2013 Conference , Valencia, Spain, September 23–26, 2013. 6
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.