Where Does Authorship Signal Emerge in Encoder-Based Language Models?

Florian Cafiero; Francis Kulumba; Guillaume Vimont; Laurent Romary

arxiv: 2605.19908 · v1 · pith:YU2U2MWXnew · submitted 2026-05-19 · 💻 cs.CL

Where Does Authorship Signal Emerge in Encoder-Based Language Models?

Francis Kulumba , Guillaume Vimont , Laurent Romary , Florian Cafiero This is my paper

Pith reviewed 2026-05-20 06:07 UTC · model grok-4.3

classification 💻 cs.CL

keywords authorship attributionmechanistic interpretabilityencoder language modelsscoring mechanismslayer-wise analysisstylistic featurescausal interventionmean pooling

0 comments

The pith

The scoring mechanism alone decides the layer where encoder models consolidate authorship signals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that authorship attribution models using identical encoders, data, and training loss still vary four-fold in accuracy based only on how they score representations. Stylistic cues such as word length, punctuation density, and function-word frequency appear equally at every layer across models, including untouched control encoders. Causal interventions then reveal that the scorer itself controls when the encoder gathers the authorship signal into usable form. Mean pooling drives early-to-mid-layer consolidation while late interaction pushes the same process to later layers. This timing difference traces directly to each scorer's gradient structure and produces separate training paths.

Core claim

Authorship attribution models fine-tuned with the same pretrained encoder, data, and loss differ up to four-fold in performance solely due to their scoring mechanism. Mechanistic tools show stylistic features remain available at every layer in every encoder, including off-the-shelf controls. Causal interventions establish that the scorer dictates consolidation timing: mean pooling forces the signal to consolidate by early-to-mid layers, whereas late interaction defers consolidation to later layers. The difference follows from the distinct gradient structures of the two scorers and produces correspondingly distinct learning trajectories.

What carries the argument

Causal intervention that isolates layer-wise authorship signal under mean pooling versus late interaction scorers.

If this is right

Mean pooling models learn to rely on early-layer representations while late-interaction models continue refining signal in deeper layers.
Training dynamics diverge because each scorer back-propagates authorship gradients through different depths.
Performance gaps arise from the timing of signal consolidation rather than from differences in what features the encoder can represent.
Changing only the final scorer can move the effective depth at which an encoder solves the same stylistic task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers of style-sensitive classifiers may improve results by deliberately choosing scorers that delay consolidation when deeper contextual cues matter.
The same layer-timing logic could explain performance differences in other attribute classification tasks that rely on subtle surface patterns.
Directly editing gradient flow during training might let practitioners control consolidation depth without swapping scorers.

Load-bearing premise

Stylistic features stay equally detectable at every layer in every model even after fine-tuning, so performance gaps cannot come from uneven feature availability.

What would settle it

A controlled experiment that measures authorship attribution accuracy after zeroing stylistic features only in early layers and finds mean-pooling models degrade far more than late-interaction models.

Figures

Figures reproduced from arXiv: 2605.19908 by Florian Cafiero, Francis Kulumba, Guillaume Vimont, Laurent Romary.

**Figure 1.** Figure 1: Conceptual overview. Left: The pretrained language model encodes stylistic features at every layer, regardless of fine-tuning. Center: Two scoring mechanisms read out these features differently. Mean pooling averages all tokens into a single vector. Late interaction (LI) (Khattab and Zaharia, 2020) compares tokens directly. Right: Causal intervention reveals that the scoring mechanism determines where the … view at source ↗

**Figure 2.** Figure 2: Token length distributions for positive (blue) [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: LISA probe R2 heatmaps at the final checkpoint. Rows are stylistic feature categories. Columns are encoder layers. The three fine-tuned models produce nearly identical heatmaps. Word length is the most readable feature (R2 ≈ 0.57), followed by capitalization rate, type–token ratio, and punctuation density. 0 5 10 15 20 Patch layer index 0.0 0.2 0.4 0.6 0.8 1.0 Fraction rank-recovered Rank recovery all mode… view at source ↗

**Figure 4.** Figure 4: Rank recovery across the three models. Each panel shows one tier. Purple: layerwise (mean pooling), orange: LI, green: PLI n=2. Dashed line: chance (0.5). Mean pooling crosses chance at layer 9, while both interaction models cross at layers 14–16. The six-layer gap is consistent across all three tiers. layer 13. This pattern is consistent across all three tiers. On Tier C, all models show slightly abovech… view at source ↗

**Figure 5.** Figure 5: Score sensitivity per layer. Mean |s (ℓ) patched − scorrupt| when restoring clean activations at layer ℓ. LI (orange) is most sensitive, PLI (green) is intermediate, layerwise (purple) is an order of magnitude lower. intermediate checkpoints ( [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Training dynamics. Mean percentage recovery across Tier A triplets at eight checkpoints. Each subplot is one checkpoint. x-axis: layer index; y-axis: mean recovery. Percentage recovery is used here because rank recovery is binary and too coarse to track gradual signal emergence at early checkpoints. The y-axis extremes reflect the known instability of percentage recovery (§2.5). duce nearly identical probe… view at source ↗

read the original abstract

Authorship attribution models fine-tuned with the same pretrained encoder, data, and loss can differ four-fold in performance depending only on their scoring mechanism. We use mechanistic interpretability tools to explain this gap. Stylistic features such as word length, punctuation density, and function-word frequency are equally available at every layer in every model, including in an off-the-shelf control encoder, hence the gap not coming from representation quality. Instead, causal intervention shows that the scorer determines where the encoder consolidates authorship signal. Mean pooling forces consolidation by early to mid layers, while late interaction defers it to later layers. We further derive this difference from the gradient structure of each scorer, and training dynamics reveal distinct learning trajectories that follow from that difference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Scorer choice moves authorship signal consolidation across layers via gradient structure, with interventions showing the timing difference but resting on limited stylistic feature checks.

read the letter

The main takeaway is that the scoring head determines when the encoder consolidates authorship signal. Mean pooling leads to early or mid-layer consolidation while late interaction defers it, and this timing difference accounts for the large performance gap even when basic stylistic features remain available throughout the model including in a control encoder. They trace the difference back to the gradient structure of each scorer and show matching training trajectories. This is a direct mechanistic account rather than another accuracy table. The interventions and layer-wise checks give a concrete way to see why one head works better than another by a factor of four. The gradient derivation keeps the explanation from depending on the final numbers alone. The control encoder helps separate representation quality from consolidation timing. A soft spot is the set of features used to establish equal availability across layers. Word length, punctuation density, and function-word frequency are straightforward surface cues, but authorship often involves higher-order patterns such as syntactic preferences or rare word choices that might consolidate or interact differently. If those are not fully covered, the claim that timing alone drives the gap could need more qualification. The paper targets readers working on authorship attribution or mechanistic interpretability in encoder models. Someone thinking about head design or layer-wise learning would pick up usable ideas here. It deserves peer review because the methods are specific enough to test and the performance puzzle is real.

Referee Report

2 major / 2 minor

Summary. The paper claims that authorship attribution models fine-tuned with identical pretrained encoders, data, and loss can differ up to four-fold in performance solely due to the scoring mechanism. Using mechanistic interpretability, it shows that hand-selected stylistic features (word length, punctuation density, function-word frequency) are equally detectable across layers in all models including an off-the-shelf control encoder, ruling out representation quality as the cause. Causal interventions and gradient derivations instead demonstrate that mean pooling forces authorship-signal consolidation in early-to-mid layers while late interaction defers it to later layers, with supporting evidence from distinct training trajectories.

Significance. If the central claim holds, the work would provide a mechanistic account of how scorer choice shapes layer-wise consolidation of stylistic signals in encoder models, with direct implications for designing and interpreting authorship attribution systems and related stylistic NLP tasks. The combination of causal interventions, gradient analysis, and training dynamics offers a falsifiable explanation that could generalize beyond the specific features examined.

major comments (2)

[Abstract and results on feature availability] The conclusion that the performance gap arises exclusively from consolidation timing (rather than representation quality) rests on the claim that the selected stylistic features are equally available at every layer in every model, including the off-the-shelf control encoder. Because these features constitute only a subset of possible authorship cues, the manuscript must demonstrate that higher-order signals (syntactic preferences, rare lexical choices, discourse patterns) do not exhibit layer- or scorer-dependent differences that could account for the observed accuracy gap.
[Causal intervention experiments] The causal-intervention results that isolate the scorer's effect on consolidation location require explicit specification of the intervention protocol, the exact layers tested, the control conditions, and any statistical tests for significance. Without these details it is difficult to assess whether post-hoc choices or incomplete controls affect the layer-wise conclusions.

minor comments (2)

Define the precise implementation of the 'late interaction' scorer (including any architectural modifications to the encoder) at the first mention to aid readers who may not be familiar with the term.
Add a brief description of data exclusion rules, preprocessing steps, and the exact statistical methods used to compare feature detectability across layers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important areas for clarification and strengthening. We address each major comment point by point below.

read point-by-point responses

Referee: [Abstract and results on feature availability] The conclusion that the performance gap arises exclusively from consolidation timing (rather than representation quality) rests on the claim that the selected stylistic features are equally available at every layer in every model, including the off-the-shelf control encoder. Because these features constitute only a subset of possible authorship cues, the manuscript must demonstrate that higher-order signals (syntactic preferences, rare lexical choices, discourse patterns) do not exhibit layer- or scorer-dependent differences that could account for the observed accuracy gap.

Authors: We agree that the examined features represent a subset of possible authorship cues. The off-the-shelf encoder control already establishes that these low-level stylistic signals are detectable across layers without any fine-tuning for authorship, which helps isolate representation quality as not being the source of the gap. To address higher-order signals more directly, we will add probing experiments for syntactic dependency frequencies and discourse marker usage in the revised manuscript, confirming they exhibit comparable layer-wise availability patterns independent of scorer choice. revision: yes
Referee: [Causal intervention experiments] The causal-intervention results that isolate the scorer's effect on consolidation location require explicit specification of the intervention protocol, the exact layers tested, the control conditions, and any statistical tests for significance. Without these details it is difficult to assess whether post-hoc choices or incomplete controls affect the layer-wise conclusions.

Authors: We accept that the current description lacks sufficient detail on the experimental protocol. In the revision we will add a dedicated subsection specifying the full intervention protocol (activation replacement with neutral baselines derived from non-authorship examples), the exact layers tested (0 through 11), the control conditions (random neuron interventions and shuffled-label baselines), and the statistical tests (paired t-tests with Bonferroni correction, all key layer differences significant at p < 0.01). revision: yes

Circularity Check

0 steps flagged

Derivation relies on independent causal interventions and gradient analysis rather than self-referential fitting.

full rationale

The paper establishes that stylistic features are available at every layer through direct measurement in an off-the-shelf control encoder, providing an empirical basis independent of the model's fine-tuned performance. The consolidation location is then attributed to the scorer via causal interventions and derived from the gradient structure of mean pooling versus late interaction, along with observed training dynamics. These steps form a self-contained chain that does not reduce the final claims back to the input performance numbers or require self-citation for uniqueness. The analysis appears to use external benchmarks like the control encoder to rule out representation quality differences.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields limited visibility into free parameters or invented entities; no obvious fitted constants or new postulated objects are described.

axioms (1)

domain assumption Stylistic features remain equally detectable across layers in an untrained control encoder
Invoked to rule out representation quality as the source of the performance gap.

pith-pipeline@v0.9.0 · 5653 in / 1229 out tokens · 44401 ms · 2026-05-20T06:07:06.798359+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The scorer term determines how that gradient distributes across individual tokens... Mean pooling: dense, uniform gradient... MaxSim: sparse, selective gradient.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 4 internal anchors

[1]

Same Author or Just Same Topic? Towards Content-Independent Style Representations , shorttitle =

Wegmann, Anna and Schraagen, Marijn and Nguyen, Dong , editor =. Same Author or Just Same Topic? Towards Content-Independent Style Representations , shorttitle =. Proceedings of the 7th Workshop on Representation Learning for NLP , publisher =. 2022 , pages =. doi:10.18653/v1/2022.repl4nlp-1.26 , urldate =

work page doi:10.18653/v1/2022.repl4nlp-1.26 2022
[2]

IDIOLEX: Unified and Continuous Representations for Idiolectal and Stylistic Variation

Kantharuban, Anjali and Srivastava, Aarohi and Faisal, Fahim and Ahia, Orevaoghene and Anastasopoulos, Antonios and Chiang, David and Tsvetkov, Yulia and Neubig, Graham , month = apr, year =. doi:10.48550/arXiv.2604.04704 , urldate =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.04704
[3]

Whodunit? Learning to Contrast for Authorship Attribution , shorttitle =

Ai, Bo and Wang, Yuchen and Tan, Yugin and Tan, Samson , editor =. Whodunit? Learning to Contrast for Authorship Attribution , shorttitle =. Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, Volume 1: Long Papers , publi...

work page doi:10.18653/v1/2022.aacl-main.84 2022
[4]

Isolating Authorship from Content with Semantic Embeddings and Contrastive Learning , url =

Huertas-Tato, Javier and Gir. Isolating Authorship from Content with Semantic Embeddings and Contrastive Learning , url =. 2024 , note =. doi:10.48550/arXiv.2411.18472 , urldate =

work page doi:10.48550/arxiv.2411.18472 2024
[5]

Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval , publisher =

Khattab, Omar and Zaharia, Matei , month = jul, year =. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval , publisher =. doi:10.1145/3397271.3401075 , urldate =

work page doi:10.1145/3397271.3401075
[6]

Localizing Model Behavior with Path Patching

Goldowsky-Dill, Nicholas and MacLeod, Chris and Sato, Lucas and Arora, Aryaman , month = may, year =. Localizing. doi:10.48550/arXiv.2304.05969 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2304.05969
[7]

Investigating Gender Bias in Language Models Using Causal Mediation Analysis , url =

Vig, Jesse and Gehrmann, Sebastian and Belinkov, Yonatan and Qian, Sharon and Nevo, Daniel and Singer, Yaron and Shieber, Stuart , booktitle =. Investigating Gender Bias in Language Models Using Causal Mediation Analysis , url =

work page
[8]

Zhang, Fred and Nanda, Neel , month = oct, year =. Towards

work page
[9]

5th International Conference on Learning Representations,

Guillaume Alain and Yoshua Bengio , title =. 5th International Conference on Learning Representations,. 2017 , url =

work page 2017
[10]

Belinkov, Yonatan , month = mar, year =. Probing. Computational Linguistics , publisher =. doi:10.1162/coli_a_00422 , abstract =

work page internal anchor Pith review doi:10.1162/coli_a_00422
[11]

What Does BERT Learn about the Structure of Language?

Jawahar, Ganesh and Sagot, Beno \^i t and Seddah, Djam \'e. What Does BERT Learn about the Structure of Language?. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1356

work page doi:10.18653/v1/p19-1356 2019
[12]

2019 , eprint=

Representation Learning with Contrastive Predictive Coding , author=. 2019 , eprint=

work page 2019
[13]

and Miano, Olivia Elizabeth and Ordonez, Juanita and Chen, Barry Y

Rivera-Soto, Rafael A. and Miano, Olivia Elizabeth and Ordonez, Juanita and Chen, Barry Y. and Khan, Aleem and Bishop, Marcus and Andrews, Nicholas , editor =. Learning Universal Authorship Representations , url =. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , publisher =. 2021 , pages =. doi:10.18653/v1/2021.emn...

work page doi:10.18653/v1/2021.emnlp-main.70 2021
[14]

Proceedings of the 37th International Conference on Machine Learning , articleno =

Wang, Tongzhou and Isola, Phillip , title =. Proceedings of the 37th International Conference on Machine Learning , articleno =. 2020 , publisher =

work page 2020
[15]

Locating and editing factual associations in

Meng, Kevin and Bau, David and Andonian, Alex and Belinkov, Yonatan , month = nov, year =. Locating and editing factual associations in. Proceedings of the 36th

work page
[16]

BERT Rediscovers the Classical NLP Pipeline

Tenney, Ian and Das, Dipanjan and Pavlick, Ellie. BERT Rediscovers the Classical NLP Pipeline. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1452

work page doi:10.18653/v1/p19-1452 2019
[17]

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference , author =. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Volume 1: Long Papers , month = jul, year =. doi:10.18653/v1/2025.acl-long.127 , pages =

work page doi:10.18653/v1/2025.acl-long.127 2025
[18]

Designing and Interpreting Probes with Control Tasks

Hewitt, John and Liang, Percy. Designing and Interpreting Probes with Control Tasks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1275

work page doi:10.18653/v1/d19-1275 2019
[19]

Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance?

Ravichander, Abhilasha and Belinkov, Yonatan and Hovy, Eduard. Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance?. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021. doi:10.18653/v1/2021.eacl-main.295

work page doi:10.18653/v1/2021.eacl-main.295 2021
[20]

Interpretability in the Wild: a Circuit for Indirect Object Identification in

Kevin Ro Wang and Alexandre Variengien and Arthur Conmy and Buck Shlegeris and Jacob Steinhardt , booktitle=. Interpretability in the Wild: a Circuit for Indirect Object Identification in. 2023 , url=

work page 2023
[21]

Literary and Linguistic Computing , author =

Delta: A Measure of Stylistic Difference and a Guide to Likely Authorship , volume =. Literary and Linguistic Computing , author =. 2002 , pages =. doi:10.1093/llc/17.3.267 , number =

work page doi:10.1093/llc/17.3.267 2002
[22]

AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs , volume =

Effects of Age and Gender on Blogging , author =. AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs , volume =

work page
[23]

Does It Capture

Wegmann, Anna and Nguyen, Dong , editor =. Does It Capture. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , publisher =. 2021 , pages =. doi:10.18653/v1/2021.emnlp-main.569 , urldate =

work page doi:10.18653/v1/2021.emnlp-main.569 2021
[24]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , month = nov, year =

Layered Insights: Generalizable Analysis of Human Authorial Style by Leveraging All Transformer Layers , author =. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , month = nov, year =. doi:10.18653/v1/2025.emnlp-main.521 , pages =

work page doi:10.18653/v1/2025.emnlp-main.521 2025
[25]

Latent Space Interpretation for Stylistic Analysis and Explainable Authorship Attribution

Alshomary, Milad and Ri, Narutatsu and Apidianaki, Marianna and Patel, Ajay and Muresan, Smaranda and McKeown, Kathleen. Latent Space Interpretation for Stylistic Analysis and Explainable Authorship Attribution. Proceedings of the 31st International Conference on Computational Linguistics. 2025

work page 2025
[26]

2024 , eprint =

Text Embeddings by Weakly-Supervised Contrastive Pre-training , author =. 2024 , eprint =

work page 2024
[27]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Liu, Yinhan and Ott, Myle and Goyal, Naman and Du, Jingfei and Joshi, Mandar and Chen, Danqi and Levy, Omer and Lewis, Mike and Zettlemoyer, Luke and Stoyanov, Veselin , year =. 1907.11692 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv 1907
[28]

International Conference on Learning Representations , year=

DeBERTA: Decoding-Enhanced BERT with Disentangled Attention , author=. International Conference on Learning Representations , year=

work page
[29]

Proceedings on Privacy Enhancing Technologies , author =

Git Blame Who? Stylistic Authorship Attribution of Small, Incomplete Source Code Fragments , volume =. Proceedings on Privacy Enhancing Technologies , author =. 2019 , pages =. doi:10.2478/popets-2019-0053 , number =

work page doi:10.2478/popets-2019-0053 2019
[30]

Science Advances , volume =

Cafiero, Florian and Camps, Jean-Baptiste , title =. Science Advances , volume =. 2019 , doi =

work page 2019
[31]

and Kaiser,

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser,. Attention is all you need , year =. Proceedings of the 31st International Conference on Neural Information Processing Systems , pages =

work page
[32]

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , editor =. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1: Long and Short Papers , publisher =. 2019 , pages =. doi:10.18653/v1/N19-1423 , urldate =

work page doi:10.18653/v1/n19-1423 2019
[33]

2025 , eprint=

HALvest-Contrastive: Retrieval-Like Authorship Attribution with Patch-Level Late Interaction , author=. 2025 , eprint=

work page 2025
[34]

Journal of the American Statistical Association , volume=

Inference in an authorship problem: A comparative study of discrimination methods applied to the authorship of the disputed Federalist Papers , author=. Journal of the American Statistical Association , volume=. 1963 , publisher=

work page 1963
[35]

N-Gram-Based Author Profiles for Authorship Attribution , booktitle =

Ke. N-Gram-Based Author Profiles for Authorship Attribution , booktitle =. 2003 , pages =

work page 2003

[1] [1]

Same Author or Just Same Topic? Towards Content-Independent Style Representations , shorttitle =

Wegmann, Anna and Schraagen, Marijn and Nguyen, Dong , editor =. Same Author or Just Same Topic? Towards Content-Independent Style Representations , shorttitle =. Proceedings of the 7th Workshop on Representation Learning for NLP , publisher =. 2022 , pages =. doi:10.18653/v1/2022.repl4nlp-1.26 , urldate =

work page doi:10.18653/v1/2022.repl4nlp-1.26 2022

[2] [2]

IDIOLEX: Unified and Continuous Representations for Idiolectal and Stylistic Variation

Kantharuban, Anjali and Srivastava, Aarohi and Faisal, Fahim and Ahia, Orevaoghene and Anastasopoulos, Antonios and Chiang, David and Tsvetkov, Yulia and Neubig, Graham , month = apr, year =. doi:10.48550/arXiv.2604.04704 , urldate =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.04704

[3] [3]

Whodunit? Learning to Contrast for Authorship Attribution , shorttitle =

Ai, Bo and Wang, Yuchen and Tan, Yugin and Tan, Samson , editor =. Whodunit? Learning to Contrast for Authorship Attribution , shorttitle =. Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, Volume 1: Long Papers , publi...

work page doi:10.18653/v1/2022.aacl-main.84 2022

[4] [4]

Isolating Authorship from Content with Semantic Embeddings and Contrastive Learning , url =

Huertas-Tato, Javier and Gir. Isolating Authorship from Content with Semantic Embeddings and Contrastive Learning , url =. 2024 , note =. doi:10.48550/arXiv.2411.18472 , urldate =

work page doi:10.48550/arxiv.2411.18472 2024

[5] [5]

Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval , publisher =

Khattab, Omar and Zaharia, Matei , month = jul, year =. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval , publisher =. doi:10.1145/3397271.3401075 , urldate =

work page doi:10.1145/3397271.3401075

[6] [6]

Localizing Model Behavior with Path Patching

Goldowsky-Dill, Nicholas and MacLeod, Chris and Sato, Lucas and Arora, Aryaman , month = may, year =. Localizing. doi:10.48550/arXiv.2304.05969 , abstract =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2304.05969

[7] [7]

Investigating Gender Bias in Language Models Using Causal Mediation Analysis , url =

Vig, Jesse and Gehrmann, Sebastian and Belinkov, Yonatan and Qian, Sharon and Nevo, Daniel and Singer, Yaron and Shieber, Stuart , booktitle =. Investigating Gender Bias in Language Models Using Causal Mediation Analysis , url =

work page

[8] [8]

Zhang, Fred and Nanda, Neel , month = oct, year =. Towards

work page

[9] [9]

5th International Conference on Learning Representations,

Guillaume Alain and Yoshua Bengio , title =. 5th International Conference on Learning Representations,. 2017 , url =

work page 2017

[10] [10]

Belinkov, Yonatan , month = mar, year =. Probing. Computational Linguistics , publisher =. doi:10.1162/coli_a_00422 , abstract =

work page internal anchor Pith review doi:10.1162/coli_a_00422

[11] [11]

What Does BERT Learn about the Structure of Language?

Jawahar, Ganesh and Sagot, Beno \^i t and Seddah, Djam \'e. What Does BERT Learn about the Structure of Language?. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1356

work page doi:10.18653/v1/p19-1356 2019

[12] [12]

2019 , eprint=

Representation Learning with Contrastive Predictive Coding , author=. 2019 , eprint=

work page 2019

[13] [13]

and Miano, Olivia Elizabeth and Ordonez, Juanita and Chen, Barry Y

Rivera-Soto, Rafael A. and Miano, Olivia Elizabeth and Ordonez, Juanita and Chen, Barry Y. and Khan, Aleem and Bishop, Marcus and Andrews, Nicholas , editor =. Learning Universal Authorship Representations , url =. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , publisher =. 2021 , pages =. doi:10.18653/v1/2021.emn...

work page doi:10.18653/v1/2021.emnlp-main.70 2021

[14] [14]

Proceedings of the 37th International Conference on Machine Learning , articleno =

Wang, Tongzhou and Isola, Phillip , title =. Proceedings of the 37th International Conference on Machine Learning , articleno =. 2020 , publisher =

work page 2020

[15] [15]

Locating and editing factual associations in

Meng, Kevin and Bau, David and Andonian, Alex and Belinkov, Yonatan , month = nov, year =. Locating and editing factual associations in. Proceedings of the 36th

work page

[16] [16]

BERT Rediscovers the Classical NLP Pipeline

Tenney, Ian and Das, Dipanjan and Pavlick, Ellie. BERT Rediscovers the Classical NLP Pipeline. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1452

work page doi:10.18653/v1/p19-1452 2019

[17] [17]

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference , author =. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Volume 1: Long Papers , month = jul, year =. doi:10.18653/v1/2025.acl-long.127 , pages =

work page doi:10.18653/v1/2025.acl-long.127 2025

[18] [18]

Designing and Interpreting Probes with Control Tasks

Hewitt, John and Liang, Percy. Designing and Interpreting Probes with Control Tasks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1275

work page doi:10.18653/v1/d19-1275 2019

[19] [19]

Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance?

Ravichander, Abhilasha and Belinkov, Yonatan and Hovy, Eduard. Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance?. Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. 2021. doi:10.18653/v1/2021.eacl-main.295

work page doi:10.18653/v1/2021.eacl-main.295 2021

[20] [20]

Interpretability in the Wild: a Circuit for Indirect Object Identification in

Kevin Ro Wang and Alexandre Variengien and Arthur Conmy and Buck Shlegeris and Jacob Steinhardt , booktitle=. Interpretability in the Wild: a Circuit for Indirect Object Identification in. 2023 , url=

work page 2023

[21] [21]

Literary and Linguistic Computing , author =

Delta: A Measure of Stylistic Difference and a Guide to Likely Authorship , volume =. Literary and Linguistic Computing , author =. 2002 , pages =. doi:10.1093/llc/17.3.267 , number =

work page doi:10.1093/llc/17.3.267 2002

[22] [22]

AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs , volume =

Effects of Age and Gender on Blogging , author =. AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs , volume =

work page

[23] [23]

Does It Capture

Wegmann, Anna and Nguyen, Dong , editor =. Does It Capture. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , publisher =. 2021 , pages =. doi:10.18653/v1/2021.emnlp-main.569 , urldate =

work page doi:10.18653/v1/2021.emnlp-main.569 2021

[24] [24]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , month = nov, year =

Layered Insights: Generalizable Analysis of Human Authorial Style by Leveraging All Transformer Layers , author =. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , month = nov, year =. doi:10.18653/v1/2025.emnlp-main.521 , pages =

work page doi:10.18653/v1/2025.emnlp-main.521 2025

[25] [25]

Latent Space Interpretation for Stylistic Analysis and Explainable Authorship Attribution

Alshomary, Milad and Ri, Narutatsu and Apidianaki, Marianna and Patel, Ajay and Muresan, Smaranda and McKeown, Kathleen. Latent Space Interpretation for Stylistic Analysis and Explainable Authorship Attribution. Proceedings of the 31st International Conference on Computational Linguistics. 2025

work page 2025

[26] [26]

2024 , eprint =

Text Embeddings by Weakly-Supervised Contrastive Pre-training , author =. 2024 , eprint =

work page 2024

[27] [27]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Liu, Yinhan and Ott, Myle and Goyal, Naman and Du, Jingfei and Joshi, Mandar and Chen, Danqi and Levy, Omer and Lewis, Mike and Zettlemoyer, Luke and Stoyanov, Veselin , year =. 1907.11692 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv 1907

[28] [28]

International Conference on Learning Representations , year=

DeBERTA: Decoding-Enhanced BERT with Disentangled Attention , author=. International Conference on Learning Representations , year=

work page

[29] [29]

Proceedings on Privacy Enhancing Technologies , author =

Git Blame Who? Stylistic Authorship Attribution of Small, Incomplete Source Code Fragments , volume =. Proceedings on Privacy Enhancing Technologies , author =. 2019 , pages =. doi:10.2478/popets-2019-0053 , number =

work page doi:10.2478/popets-2019-0053 2019

[30] [30]

Science Advances , volume =

Cafiero, Florian and Camps, Jean-Baptiste , title =. Science Advances , volume =. 2019 , doi =

work page 2019

[31] [31]

and Kaiser,

Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser,. Attention is all you need , year =. Proceedings of the 31st International Conference on Neural Information Processing Systems , pages =

work page

[32] [32]

Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina , editor =. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1: Long and Short Papers , publisher =. 2019 , pages =. doi:10.18653/v1/N19-1423 , urldate =

work page doi:10.18653/v1/n19-1423 2019

[33] [33]

2025 , eprint=

HALvest-Contrastive: Retrieval-Like Authorship Attribution with Patch-Level Late Interaction , author=. 2025 , eprint=

work page 2025

[34] [34]

Journal of the American Statistical Association , volume=

Inference in an authorship problem: A comparative study of discrimination methods applied to the authorship of the disputed Federalist Papers , author=. Journal of the American Statistical Association , volume=. 1963 , publisher=

work page 1963

[35] [35]

N-Gram-Based Author Profiles for Authorship Attribution , booktitle =

Ke. N-Gram-Based Author Profiles for Authorship Attribution , booktitle =. 2003 , pages =

work page 2003