pith. sign in

arxiv: 2603.28054 · v2 · pith:GC6YFGC4new · submitted 2026-03-30 · 💻 cs.CL

Who Wrote the Book? Detecting and Attributing LLM Ghostwriters

classification 💻 cs.CL
keywords tracebookghostwritebenchlightweightworksachievesacrossanother
0
0 comments X
read the original abstract

In this paper, we introduce GhostWriteBench, a dataset for LLM authorship attribution. It comprises long-form texts (50K+ words per book) generated by frontier LLMs, and is designed to test generalisation across multiple out-of-distribution (OOD) dimensions, including domain and unseen LLM author. We also propose TRACE -- a novel fingerprinting method that is interpretable and lightweight -- that works for both open- and closed-source models. TRACE creates the fingerprint by capturing token-level transition patterns (e.g., word rank) estimated by another lightweight language model. Experiments on GhostWriteBench demonstrate that TRACE achieves state-of-the-art performance, remains robust in OOD settings, and works well in limited training data scenarios.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.