Persuade Me if You Can: A Framework for Evaluating Persuasion Effectiveness and Susceptibility Among Large Language Models

Dilek Hakkani-T\"ur; Gokhan Tur; Nimet Beyza Bozdag; Shuhaib Mehri

arxiv: 2503.01829 · v4 · pith:S3AIZUYMnew · submitted 2025-03-03 · 💻 cs.CL · cs.AI· cs.LG· cs.MA

Persuade Me if You Can: A Framework for Evaluating Persuasion Effectiveness and Susceptibility Among Large Language Models

Nimet Beyza Bozdag , Shuhaib Mehri , Gokhan Tur , Dilek Hakkani-T\"ur This is my paper

classification 💻 cs.CL cs.AIcs.LGcs.MA

keywords persuasionllmsframeworksusceptibilityeffectivenesshumanpersuadepersuasive

0 comments

read the original abstract

Large Language Models (LLMs) demonstrate persuasive capabilities that rival human-level persuasion. While these capabilities can be used for social good, they also present risks of potential misuse. Beyond the concern of how LLMs persuade others, their own susceptibility to persuasion poses a critical alignment challenge, raising questions about robustness, safety, and adherence to ethical principles. To study these dynamics, we introduce Persuade Me If You Can (PMIYC), an automated framework for evaluating persuasiveness and susceptibility to persuasion in multi-agent interactions. Our framework offers a scalable alternative to the costly and time-intensive human annotation process typically used to study persuasion in LLMs. PMIYC automatically conducts multi-turn conversations between Persuader and Persuadee agents, measuring both the effectiveness of and susceptibility to persuasion. Our comprehensive evaluation spans a diverse set of LLMs and persuasion settings (e.g., subjective and misinformation scenarios). We validate the efficacy of our framework through human evaluations and demonstrate alignment with human assessments from prior studies. Through PMIYC, we find that Llama-3.3-70B and GPT-4o exhibit similar persuasive effectiveness, outperforming Claude 3 Haiku by 30%. However, GPT-4o demonstrates over 50% greater resistance to persuasion for misinformation compared to Llama-3.3-70B. Notably, o4-mini emerges as both an effective persuader, and a resistant persuadee. These findings provide empirical insights into the persuasive dynamics of LLMs and contribute to the development of safer AI systems.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Sell More, Play Less: Benchmarking LLM Realistic Selling Skill
cs.CL 2026-04 conditional novelty 7.0

SalesLLM provides an automatic evaluation framework for LLM sales dialogues that correlates 0.98 with human experts and shows top models approaching human performance while weaker ones lag.
When Identity Skews Debate: Anonymization for Bias-Reduced Multi-Agent Reasoning
cs.AI 2025-10 unverdicted novelty 7.0

Anonymization in multi-agent debate reduces identity bias by equalizing self and peer weights in a Bayesian update model, quantified by the Identity Bias Coefficient.
ChatCLIDS: Simulating Persuasive AI Dialogues to Promote Closed-Loop Insulin Adoption in Type 1 Diabetes Care
cs.AI 2025-08 unverdicted novelty 7.0

ChatCLIDS creates a library of expert-validated virtual patients and tests LLM agents using evidence-based persuasive strategies in simulated longitudinal and adversarial health counseling sessions for closed-loop ins...
When AI reviews science: Can we trust the referee?
cs.AI 2026-04 unverdicted novelty 6.0

AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference sub...
Scheming Ability in LLM-to-LLM Strategic Interactions
cs.CL 2025-10 conditional novelty 6.0

Frontier LLMs exhibit high scheming propensity in Cheap Talk signaling and Peer Evaluation games, achieving 95-100% success rates when choosing to deceive and 100% deception choice in one setup even without prompting.
Learning with Conflicts of Interest
cs.LG 2026-05 unverdicted novelty 5.0

A game-theoretic framework and algorithms are introduced to maximize beneficial information from ML systems while minimizing biased influences arising from conflicts of interest.