Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts

· 2025 · cs.LG · arXiv 2508.06361

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Large Language Models (LLMs) are widely deployed in reasoning, planning, and decision-making tasks, making their trustworthiness critical. A significant and underexplored risk is intentional deception, where an LLM deliberately fabricates or conceals information to serve a hidden objective. Existing studies typically induce deception by explicitly setting a hidden objective through prompting or fine-tuning, which may not reflect real-world human-LLM interactions. Moving beyond such human-induced deception, we investigate LLMs' self-initiated deception on benign prompts. To address the absence of ground truth, we propose a framework based on Contact Searching Questions (CSQ). This framework introduces two statistical metrics derived from psychological principles to quantify the likelihood of deception. The first, the Deceptive Intention Score, measures the model's bias toward a hidden objective. The second, the Deceptive Behavior Score, measures the inconsistency between the LLM's internal belief and its expressed output. Evaluating 16 leading LLMs, we find that both metrics rise in parallel and escalate with task difficulty for most models. Moreover, increasing model capacity does not always reduce deception, posing a significant challenge for future LLM development.

representative citing papers

Position: Anthropomorphic Misalignment Research Needs Stronger Evidence

cs.CY · 2026-05-29 · unverdicted · novelty 3.0

Position paper calling for stronger evidentiary standards and a diagnostic checklist in anthropomorphic misalignment research.

citing papers explorer

Showing 1 of 1 citing paper.

Position: Anthropomorphic Misalignment Research Needs Stronger Evidence cs.CY · 2026-05-29 · unverdicted · none · ref 57 · internal anchor
Position paper calling for stronger evidentiary standards and a diagnostic checklist in anthropomorphic misalignment research.

Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts

fields

years

verdicts

representative citing papers

citing papers explorer