arxiv: 2605.10606 · v1 · submitted 2026-05-11 · 💻 cs.CL · cs.AI

Recognition: no theorem link

Measuring Embedding Sensitivity to Authorial Style in French: Comparing Literary Texts with Language Model Rewritings

Benjamin Icard , Lila Sainero , Alice Breton , Evangelia Zve , Jean-Gabriel Ganascia

Authors on Pith no claims yet

Pith reviewed 2026-05-12 05:08 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords embeddingsauthorial styleLLM rewritingFrench literaturestylistic analysisauthorship detectionlanguage models

0 comments

The pith

Embeddings capture French author style reliably and retain it after LLM rewriting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether language model embeddings encode author-specific stylistic signals in French literary texts and whether those signals survive when the texts are rewritten by large language models. It measures stylistic variation through shifts in embedding dispersion on a controlled literary dataset. The findings indicate that embeddings consistently reflect authorial style, that these patterns remain detectable after rewriting, and that each LLM introduces its own characteristic alterations to the embedding space. This provides a concrete basis for detecting style imitation in an era when LLMs can mimic human writing.

Core claim

Embeddings reliably capture authorial stylistic features and these signals persist after rewriting, while also exhibiting LLM-specific patterns.

What carries the argument

Changes in embedding dispersion as a quantitative measure of stylistic variation between original French literary texts and their LLM-rewritten versions.

If this is right

Stylistic information in embeddings can support quantitative authorship attribution even in rewritten text.
LLM rewriting does not fully erase original author signals in embedding space.
Different language models produce distinct, measurable shifts in how author style appears in embeddings.
The dispersion-based approach supplies a practical metric for studying style imitation by generative models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same dispersion method could be tested on non-literary or non-French texts to check whether style capture generalizes.
If dispersion tracks style independently of topic and length, it could inform the design of style-transfer systems that intentionally preserve or modify author voice.
Combining embedding dispersion with other signals such as syntactic patterns might strengthen detection of AI-assisted authorship imitation.

Load-bearing premise

Changes in embedding dispersion specifically and accurately quantify authorial stylistic variation rather than being driven by other factors such as text length, topic, or rewriting artifacts.

What would settle it

If dispersion changes correlate more strongly with text length, topic, or surface-level rewriting artifacts than with known differences between authors when those factors are controlled.

Figures

Figures reproduced from arXiv: 2605.10606 by Alice Breton, Benjamin Icard, Evangelia Zve, Jean-Gabriel Ganascia, Lila Sainero.

**Figure 2.** Figure 2: 2D UMAP projection of xlm-roberta-large embeddings on the dataset. Points indicate the three kmeans clusters (Class 1-3), while dashed ellipses (visual guides, not k-means clusters) indicate label-based coverage regions for the human and generated corpora, drawn so that exactly 80% of the corpus lie inside the ellipse zone. The dashed ellipses in [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Pearson correlations (r) between embedding dispersion shifts in 2D UMAP reduction (∆d) and stylistic feature shifts (∆f s ) for each author-labeled corpus, comparing TUFFERY_REF with (a) the three human-authored corpora in STYLE_REF, and (b) the three style-imitated corpora in STYLE_GEN. Asterisks indicate p < 0.01 after Bonferroni correction. 4.3 Embedding Sensitivity to Authorial Style To assess embeddin… view at source ↗

**Figure 4.** Figure 4: Pearson correlations (r) between embedding dispersion shifts in 2D UMAP reduction (∆d) and stylistic feature shifts (∆f s ) per LLM imitator, comparing TUFFERY_REF with (a) PROUST_GEN, (b) CELINE_GEN, and (c) YOURCENAR_GEN. Asterisks indicate p < 0.01, with Bonferroni correction. 5 Discussion LLM Stylistic Fidelity [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: complements the aggregate performance results given in [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗

**Figure 6.** Figure 6: Pearson correlations (r) between embedding dispersion shifts in FullD (∆d) and stylistic feature shifts (∆f s ) for each author-labeled corpus, comparing TUFFERY_REF with (a) the three human-authored corpora in STYLE_REF, and (b) the three style-imitated corpora in STYLE_GEN. Asterisks indicate p < 0.01 after Bonferroni correction. of Entropy and the appearance of Structural. For YOURCENAR_REF, the expecte… view at source ↗

read the original abstract

Large language models (LLMs) can convincingly imitate human writing styles, yet it remains unclear how much stylistic information is encoded in embeddings from any language model and retained after LLM rewriting. We investigate these questions in French, using a controlled literary dataset to quantify the effect of stylistic variation via changes in embedding dispersion. We observe that embeddings reliably capture authorial stylistic features and that these signals persist after rewriting, while also exhibiting LLM-specific patterns. These analytical results offer promising directions for authorship imitation detection in the era of language models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper measures embedding dispersion on French literary texts before and after LLM rewriting and reports that authorial style signals persist, but the design leaves open whether dispersion truly isolates style or just tracks length and rewriting side effects.

read the letter

The main point is that this work tracks how embedding dispersion shifts when French novels get rewritten by LLMs and concludes that authorial style remains detectable while the rewrites add their own signatures. It sets up a direct before-and-after comparison on literary data, which is a straightforward way to test retention of stylistic information in embeddings for a language that gets less coverage than English. That controlled literary dataset and the focus on dispersion as a style proxy are the concrete contributions here. The observations on persistence and LLM-specific patterns could be useful starting points for detection tools. The soft spot is the handling of confounds. The abstract gives no sign of length normalization, topic matching across authors, or checks that rule out syntactic or semantic artifacts introduced by the rewriting process itself. If those are missing from the full methods, then the claim that embeddings reliably capture authorial style rests on an untested assumption that dispersion changes are style-driven rather than driven by other text properties. No dataset sizes, model names, or statistical details appear either, which makes it difficult to judge how stable the reported patterns actually are. This is the kind of paper that would interest people working on authorship attribution or AI-text detection in non-English settings. A reader looking for empirical baselines on style retention in French would get some value, though they would need to treat the isolation of style as provisional until controls are shown. It deserves a serious referee so the authors can supply the missing robustness checks and data details. I would not cite it in its current form, but the question is timely enough that review makes sense.

Referee Report

2 major / 1 minor

Summary. The manuscript investigates how embeddings encode authorial style in French literary texts by quantifying stylistic variation through changes in embedding dispersion. It compares original texts against LLM-rewritten versions and reports that embeddings reliably capture authorial stylistic features, that these signals persist after rewriting, and that distinct patterns emerge across different LLMs. The results are framed as offering directions for detecting authorship imitation in the LLM era.

Significance. If the dispersion-based measurements prove robust, the work would provide a concrete empirical approach to assessing style encoding in embeddings for a non-English literary corpus and to evaluating how LLM rewriting preserves or alters stylistic signals. This could support stylometric methods and AI-text detection tools, particularly given the use of a controlled literary dataset. The absence of reported statistical details, model specifications, or confound controls in the abstract, however, prevents a full evaluation of whether the observations isolate authorial style.

major comments (2)

[Abstract and Methods] Abstract and experimental setup: The central claim that embedding dispersion specifically indexes authorial stylistic features (and their persistence post-rewriting) is load-bearing, yet the description provides no indication of length normalization, topic matching across authors, or ablation of LLM-induced syntactic/semantic artifacts. Without these, observed shifts risk reflecting confounds rather than style, as noted in the stress-test concern.
[Results] Results and interpretation: The observations that 'embeddings reliably capture authorial stylistic features' and 'signals persist after rewriting' are presented without data sizes, statistical methods, specific LLMs, error analysis, or quantitative effect sizes. This makes it impossible to verify support for the claims or to distinguish LLM-specific patterns from artifacts.

minor comments (1)

[Abstract] The abstract would be strengthened by including at least one key quantitative result (e.g., dispersion delta values or statistical significance) to ground the stated observations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help us clarify the methodological controls and quantitative reporting in our work. We respond to each major point below, drawing on details from the full manuscript, and note the revisions we will implement.

read point-by-point responses

Referee: [Abstract and Methods] Abstract and experimental setup: The central claim that embedding dispersion specifically indexes authorial stylistic features (and their persistence post-rewriting) is load-bearing, yet the description provides no indication of length normalization, topic matching across authors, or ablation of LLM-induced syntactic/semantic artifacts. Without these, observed shifts risk reflecting confounds rather than style, as noted in the stress-test concern.

Authors: We agree the abstract is too concise to list these elements. The full Methods section specifies that texts were drawn from a controlled literary corpus with authors matched by genre and historical period to limit topical confounds, and that all samples were truncated to identical token lengths prior to embedding. A dedicated stress-test subsection compares dispersion shifts under LLM rewriting to those from random syntactic and lexical perturbations. We will revise the abstract to reference these controls explicitly. revision: yes
Referee: [Results] Results and interpretation: The observations that 'embeddings reliably capture authorial stylistic features' and 'signals persist after rewriting' are presented without data sizes, statistical methods, specific LLMs, error analysis, or quantitative effect sizes. This makes it impossible to verify support for the claims or to distinguish LLM-specific patterns from artifacts.

Authors: The Results section reports the corpus composition, applies statistical comparisons (including significance testing and effect-size metrics) to dispersion values, names the LLMs used for rewriting, and presents error analysis together with LLM-specific pattern quantification via comparative metrics and figures. We will add a compact summary of these elements to the abstract to improve verifiability. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical measurements with no derivations or self-referential fits

full rationale

The paper reports direct empirical observations of embedding dispersion changes across original literary texts and LLM rewritings in French. No equations, parameter fittings, predictions derived from subsets of the same data, or load-bearing self-citations appear in the provided abstract or description. The central claims rest on controlled dataset comparisons rather than any reduction of results to inputs by construction, satisfying the criteria for a self-contained non-circular analysis.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract does not specify any free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5393 in / 1024 out tokens · 56487 ms · 2026-05-12T05:08:03.984076+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

124 extracted references · 124 canonical work pages · 5 internal anchors

[1]

Collection Folio , year=

Exercices de Style: Edition Gallimard , author=. Collection Folio , year=

work page
[2]

ACM Transactions on Intelligent Systems and Technology , volume=

Explainability for large language models: A survey , author=. ACM Transactions on Intelligent Systems and Technology , volume=. 2024 , publisher=

work page 2024
[3]

Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) , pages=

Glove: Global vectors for word representation , author=. Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) , pages=

work page 2014
[4]

Journal of machine Learning research , volume=

Latent dirichlet allocation , author=. Journal of machine Learning research , volume=

work page
[5]

Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning , pages=

Characterizing stylistic elements in syntactic structure , author=. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning , pages=

work page 2012
[6]

1970 , publisher=

F. 1970 , publisher=

work page 1970
[7]

Effective Purity Method for Measuring the Clustering Accuracy and its Illustration , volume =

Sikhakolli, Srinivasa and Sikhakolli, Asha Kiran , year =. Effective Purity Method for Measuring the Clustering Accuracy and its Illustration , volume =. International Journal of Computer Applications , doi =

work page
[8]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , author=. arXiv preprint arXiv:1810.04805 , url=

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=

Writing Style Author Embedding Evaluation , author=. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages=. 2021 , organization=

work page 2021
[10]

Scientific reports , volume=

Authorship identification using ensemble learning , author=. Scientific reports , volume=. 2022 , publisher=

work page 2022
[11]

CLEF 2023 Working Notes , year=

A Writing Style Embedding Based on Contrastive Learning for Multi-Author Writing Style Analysis , author=. CLEF 2023 Working Notes , year=

work page 2023
[12]

Proceedings of the 7th Workshop on Representation Learning for NLP , pages=

Same Author or Just Same Topic? Towards Content-Independent Style Representations , author=. Proceedings of the 7th Workshop on Representation Learning for NLP , pages=. 2022 , organization=

work page 2022
[13]

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

BERTopic: Neural topic modeling with a class-based TF-IDF procedure , author=. arXiv preprint arXiv:2203.05794 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[14]

and Solorio, Thamar

Maharjan, Suraj and Mave, Deepthi and Shrestha, Prasha and Montes, Manuel and Gonz \'a lez, Fabio A. and Solorio, Thamar. Jointly Learning Author and Annotated Character N-gram Embeddings: A Case Study in Literary Text. Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019). 2019

work page 2019
[15]

Transactions of the Association for Computational Linguistics , volume=

Topic modeling in embedding spaces , author=. Transactions of the Association for Computational Linguistics , volume=. 2020 , publisher=

work page 2020
[16]

arXiv preprint arXiv:1909.08349 , year=

A lexical, syntactic, and semantic perspective for understanding style in text , author=. arXiv preprint arXiv:1909.08349 , year=

work page arXiv 1909
[17]

2022 , publisher=

Computational modeling of narrative , author=. 2022 , publisher=

work page 2022
[18]

Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks

Reimers, Nils and Gurevych, Iryna. Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019. doi:10.18653/v1/D19-1410

work page doi:10.18653/v1/d19-1410 2019
[19]

Handbook of Empirical Literary Studies , pages=

Computational stylistics , author=. Handbook of Empirical Literary Studies , pages=. 2021 , url=

work page 2021
[20]

arXiv preprint arXiv:1905.05621 , year=

Style transformer: Unpaired text style transfer without disentangled latent representation , author=. arXiv preprint arXiv:1905.05621 , year=

work page arXiv 1905
[21]

2020 , publisher=

Embeddings in natural language processing: Theory and advances in vector representations of meaning , author=. 2020 , publisher=

work page 2020
[22]

WWW (Companion volume) , pages=

Author2Vec: Learning Author Representations by Combining Content and Link Information , author=. WWW (Companion volume) , pages=

work page
[23]

Nature Biotechnology , volume =

Initialization is critical for preserving global data structure in both t-SNE and UMAP , author =. Nature Biotechnology , volume =. 2021 , doi =

work page 2021
[24]

Journal of Open Source Software , volume =

UMAP: Uniform Manifold Approximation and Projection , author =. Journal of Open Source Software , volume =. 2018 , doi =

work page 2018
[25]

5th online world conference on soft computing in industrial applications (WSC5) , volume=

The curse of dimensionality , author=. 5th online world conference on soft computing in industrial applications (WSC5) , volume=

work page
[26]

Proceedings of the sixth new zealand computer science research student conference (NZCSRSC2008), Christchurch, New Zealand , volume=

Similarity measures for text document clustering , author=. Proceedings of the sixth new zealand computer science research student conference (NZCSRSC2008), Christchurch, New Zealand , volume=

work page
[27]

Universal Sentence Encoder

Universal sentence encoder , author=. arXiv preprint arXiv:1803.11175 , url=

work page Pith review arXiv
[28]

International conference on machine learning , pages=

Distributed representations of sentences and documents , author=. International conference on machine learning , pages=. 2014 , url=

work page 2014
[29]

2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003

Robust data clustering , author=. 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings. , volume=. 2003 , organization=

work page 2003
[30]

International Journal of Future Computer and Communication , volume=

Clutching of Clustering Validation Criteria , author=. International Journal of Future Computer and Communication , volume=. 2024 , url=

work page 2024
[31]

Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability , volume=

Some methods for classification and analysis of multivariate observations , author=. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability , volume=

work page
[32]

Technometrics , volume=

Principal component analysis , author=. Technometrics , volume=. 1987 , url=

work page 1987
[33]

Proceedings of the International Conference on Data Mining , year=

Purity: A new measure for clustering evaluation , author=. Proceedings of the International Conference on Data Mining , year=

work page
[34]

Journal of Machine Learning Research , volume=

Relationship-based clustering and cluster ensembles , author=. Journal of Machine Learning Research , volume=

work page
[35]

2017 , eprint=

Simple and Effective Dimensionality Reduction for Word Embeddings , author=. 2017 , eprint=

work page 2017
[36]

Cross-validation methods in principal component analysis: A comparison , volume =

Diana, Giancarlo and Tommasi, Chiara , year =. Cross-validation methods in principal component analysis: A comparison , volume =. Statistical Methods & Applications , doi =

work page
[37]

Statistical Methods & Applications , volume=

Cross-validation methods in principal component analysis: A comparison , author=. Statistical Methods & Applications , volume=. 2002 , publisher=

work page 2002
[38]

I. T. Jolliffe , title =. 2016 , publisher =

work page 2016
[39]

A Tutorial on Principal Component Analysis

J. Shlens , title =. arXiv preprint arXiv:1404.1100 , year =

work page Pith review arXiv
[40]

and others , title =

Aliakbar, A. and others , title =. Journal of Data Science , year =

work page
[41]

and others , title =

Fachada, J. and others , title =. Data Mining and Knowledge Discovery , year =

work page
[42]

Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems , pages=

Directed diversity: Leveraging language embedding distances for collective creativity in crowd ideation , author=. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems , pages=

work page 2021
[43]

Computer Science Review , volume=

Conceptual and empirical comparison of dimensionality reduction algorithms (pca, kpca, lda, mds, svd, lle, isomap, le, ica, t-sne) , author=. Computer Science Review , volume=. 2021 , publisher=

work page 2021
[44]

, author=

Visualizing data using t-SNE. , author=. Journal of machine learning research , volume=

work page
[45]

and others , title =

Holland, P. and others , title =. Machine Learning Journal , year =

work page
[46]

GPT-4 Technical Report

Gpt-4 technical report , author=. arXiv preprint arXiv:2303.08774 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[47]

Mistral 7B

Mistral 7B , author=. arXiv preprint arXiv:2310.06825 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[48]

2011 International Conference on Document Analysis and Recognition , pages=

Writer identification using TF-IDF for cursive handwritten word recognition , author=. 2011 International Conference on Document Analysis and Recognition , pages=. 2011 , url=

work page 2011
[49]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Self-attention attribution: Interpreting information interactions inside transformer , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

work page
[50]

Working Notes of CLEF , url=

Team liuc0757 at PAN: A Writing Style Embedding Method Based on Contrastive Learning for Multi-Author Writing Style Analysis , author=. Working Notes of CLEF , url=

work page
[51]

European Conference on Information Retrieval , pages=

Overview of PAN 2024: multi-author writing style analysis, multilingual text detoxification, oppositional thinking analysis, and generative AI authorship verification , author=. European Conference on Information Retrieval , pages=. 2024 , organization=

work page 2024
[52]

Transactions of the Association for Computational Linguistics , volume=

Can Authorship Representation Learning Capture Stylistic Features? , author=. Transactions of the Association for Computational Linguistics , volume=. 2023 , publisher=

work page 2023
[53]

Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

tBERT: Topic models and BERT joining forces for semantic similarity detection , author=. Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

work page
[54]

2014 , publisher=

The statistical study of literary vocabulary , author=. 2014 , publisher=

work page 2014
[55]

The Bell system technical journal , volume=

A mathematical theory of communication , author=. The Bell system technical journal , volume=. 1948 , publisher=

work page 1948
[56]

Derivation of new readability formulas (Automated readability index, fog count, and flesch reading ease formula) for Navy enlisted personnel , author=

work page
[57]

2008 , publisher=

Introduction to information retrieval , author=. 2008 , publisher=

work page 2008
[58]

Journal of statistical mechanics: Theory and experiment , volume=

Comparing community structure identification , author=. Journal of statistical mechanics: Theory and experiment , volume=. 2005 , publisher=

work page 2005
[59]

Plos one , volume=

Detection of changes in literary writing style using N-grams as style markers and supervised machine learning , author=. Plos one , volume=. 2022 , publisher=

work page 2022
[60]

2018 IEEE 5th International Conference on data science and advanced analytics (DSAA) , pages=

Explaining explanations: An overview of interpretability of machine learning , author=. 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA) , pages=. 2018 , organization=

work page 2018
[61]

MTEB: Massive Text Embedding Benchmark

MTEB: Massive text embedding benchmark , author=. arXiv preprint arXiv:2210.07316 , year=

work page internal anchor Pith review arXiv
[62]

Exposing propaganda: an analysis of stylistic cues comparing human annotations and machine classification , booktitle =

Faye, G. Exposing propaganda: an analysis of stylistic cues comparing human annotations and machine classification , booktitle =. 2024 , address =

work page 2024
[63]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

A Multi-Label Dataset of French Fake News: Human and Machine Insights , author=. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) , pages=

work page 2024
[64]

Embedding Style Beyond Topics: Analyzing Dispersion Effects Across Different Language Models

Icard, Benjamin and Zve, Evangelia and Sainero, Lila and Breton, Alice and Ganascia, Jean-Gabriel. Embedding Style Beyond Topics: Analyzing Dispersion Effects Across Different Language Models. Proceedings of the 31st International Conference on Computational Linguistics. 2025

work page 2025
[65]

Kumarage, G

A survey of ai-generated text forensic systems: Detection, attribution, and characterization , author=. arXiv preprint arXiv:2403.01152 , year=

work page arXiv
[66]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=. 2017 , url=

work page 2017
[67]

Advances in neural information processing systems , volume=

Training language models to follow instructions with human feedback , author=. Advances in neural information processing systems , volume=

work page
[68]

Paris: Cylibris , year=

Le style mode d’emploi , author=. Paris: Cylibris , year=

work page
[69]

À la recherche du temps perdu , volume =

Marcel Proust , title =. À la recherche du temps perdu , volume =. 1913 , publisher =

work page 1913
[70]

1932 , publisher=

Voyage au bout de la nuit , author=. 1932 , publisher=

work page 1932
[71]

1951 , publisher=

Mémoires d'Hadrien , author=. 1951 , publisher=

work page 1951
[72]

DOI: 10.1109/TVCG.2023.3326569 , year =

Large-Scale Evaluation of Topic Models and Dimensionality Reduction Methods for 2D Text Spatialization , author =. DOI: 10.1109/TVCG.2023.3326569 , year =

work page doi:10.1109/tvcg.2023.3326569 2023
[73]

, author=

hdbscan: Hierarchical density based clustering. , author=. J. Open Source Softw. , volume=

work page
[74]

2020 IEEE 7th international conference on data science and advanced analytics (DSAA) , pages=

Cluster quality analysis using silhouette score , author=. 2020 IEEE 7th international conference on data science and advanced analytics (DSAA) , pages=. 2020 , organization=

work page 2020
[75]

T iny S tyler: Efficient Few‑Shot Text Style Transfer with Authorship Embeddings

Horvitz, Zachary and Patel, Ajay and Singh, Kanishk and Callison‑Burch, Chris and McKeown, Kathleen and Yu, Zhou. T iny S tyler: Efficient Few‑Shot Text Style Transfer with Authorship Embeddings. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.781

work page doi:10.18653/v1/2024.findings-emnlp.781 2024
[76]

RAG s to Style: Personalizing LLMs with Style Embeddings

Neelakanteswara, Abhiman and Chaudhari, Shreyas and Zamani, Hamed. RAG s to Style: Personalizing LLMs with Style Embeddings. Proceedings of the 1st Workshop on Personalization of Generative AI Systems (PERSONALIZE 2024). 2024

work page 2024
[77]

Style Extraction on Text Embeddings Using VAE and Parallel Dataset

Kong, InJin and Kang, Shinyee and Park, Yuna and Kim, Sooyong and Park, Sanghyun. Style Extraction on Text Embeddings Using VAE and Parallel Dataset. 2025

work page 2025
[78]

Robust AI-Generated Text Detection by Restricted Embeddings

Kuznetsov, Kristian and Tulchinskii, Eduard and Kushnareva, Laida and Magai, German and Barannikov, Serguei and Nikolenko, Sergey and Piontkovskaya, Irina. Robust AI-Generated Text Detection by Restricted Embeddings. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.992

work page doi:10.18653/v1/2024.findings-emnlp.992 2024
[79]

Journal of the royal statistical society

Algorithm AS 136: A k-means clustering algorithm , author=. Journal of the royal statistical society. series c (applied statistics) , volume=. 1979 , url=

work page 1979
[80]

Language Resources and Evaluation , volume =

Moshe Koppel and Jonathan Schler and Shlomo Argamon , title =. Language Resources and Evaluation , volume =. 2011 , doi =

work page 2011

Showing first 80 references.