Text Analysis in Adversarial Settings: Does Deception Leave a Stylistic Trace?

N. Asokan; Tommi Gr\"ondahl

arxiv: 1902.08939 · v2 · pith:OVRB77H2new · submitted 2019-02-24 · 💻 cs.CL

Text Analysis in Adversarial Settings: Does Deception Leave a Stylistic Trace?

Tommi Gr\"ondahl , N. Asokan This is my paper

classification 💻 cs.CL

keywords deceptionstyleobfuscationsemantictextauthorcertaindeceptiveness

0 comments

read the original abstract

Textual deception constitutes a major problem for online security. Many studies have argued that deceptiveness leaves traces in writing style, which could be detected using text classification techniques. By conducting an extensive literature review of existing empirical work, we demonstrate that while certain linguistic features have been indicative of deception in certain corpora, they fail to generalize across divergent semantic domains. We suggest that deceptiveness as such leaves no content-invariant stylistic trace, and textual similarity measures provide superior means of classifying texts as potentially deceptive. Additionally, we discuss forms of deception beyond semantic content, focusing on hiding author identity by writing style obfuscation. Surveying the literature on both author identification and obfuscation techniques, we conclude that current style transformation methods fail to achieve reliable obfuscation while simultaneously ensuring semantic faithfulness to the original text. We propose that future work in style transformation should pay particular attention to disallowing semantically drastic changes.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Assessing the Applicability of Authorship Verification Methods
cs.LG 2019-06 unverdicted novelty 5.0

Some authorship verification methods reach 72.7% accuracy on 250-character informal chats and over 75% on scientific documents separated by 15.6 years on average, but all fail on cross-topic verification.