RogueAI operationalizes a reverse Turing test as a one-on-two interrogation game to detect licensed deception in LLMs, with pilot data from 467 sessions showing a simple linguistic heuristic at 75.6% accuracy versus 56.6% for human players.
Hagendorff, Deception abilities emerged in large language models, arXiv:2307.16513, 2023
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
RogueAI: A Reverse Turing Test for Detecting Licensed AI Deception in Dialogue
RogueAI operationalizes a reverse Turing test as a one-on-two interrogation game to detect licensed deception in LLMs, with pilot data from 467 sessions showing a simple linguistic heuristic at 75.6% accuracy versus 56.6% for human players.