Persuade Me if You Can: A Framework for Evaluating Persuasion Effectiveness and Susceptibility Among Large Language Models
read the original abstract
Large Language Models (LLMs) demonstrate persuasive capabilities that rival human-level persuasion. While these capabilities can be used for social good, they also present risks of potential misuse. Beyond the concern of how LLMs persuade others, their own susceptibility to persuasion poses a critical alignment challenge, raising questions about robustness, safety, and adherence to ethical principles. To study these dynamics, we introduce Persuade Me If You Can (PMIYC), an automated framework for evaluating persuasiveness and susceptibility to persuasion in multi-agent interactions. Our framework offers a scalable alternative to the costly and time-intensive human annotation process typically used to study persuasion in LLMs. PMIYC automatically conducts multi-turn conversations between Persuader and Persuadee agents, measuring both the effectiveness of and susceptibility to persuasion. Our comprehensive evaluation spans a diverse set of LLMs and persuasion settings (e.g., subjective and misinformation scenarios). We validate the efficacy of our framework through human evaluations and demonstrate alignment with human assessments from prior studies. Through PMIYC, we find that Llama-3.3-70B and GPT-4o exhibit similar persuasive effectiveness, outperforming Claude 3 Haiku by 30%. However, GPT-4o demonstrates over 50% greater resistance to persuasion for misinformation compared to Llama-3.3-70B. Notably, o4-mini emerges as both an effective persuader, and a resistant persuadee. These findings provide empirical insights into the persuasive dynamics of LLMs and contribute to the development of safer AI systems.
This paper has not been read by Pith yet.
Forward citations
Cited by 6 Pith papers
-
Sell More, Play Less: Benchmarking LLM Realistic Selling Skill
SalesLLM provides an automatic evaluation framework for LLM sales dialogues that correlates 0.98 with human experts and shows top models approaching human performance while weaker ones lag.
-
When Identity Skews Debate: Anonymization for Bias-Reduced Multi-Agent Reasoning
Anonymization in multi-agent debate reduces identity bias by equalizing self and peer weights in a Bayesian update model, quantified by the Identity Bias Coefficient.
-
ChatCLIDS: Simulating Persuasive AI Dialogues to Promote Closed-Loop Insulin Adoption in Type 1 Diabetes Care
ChatCLIDS creates a library of expert-validated virtual patients and tests LLM agents using evidence-based persuasive strategies in simulated longitudinal and adversarial health counseling sessions for closed-loop ins...
-
When AI reviews science: Can we trust the referee?
AI peer review systems are vulnerable to prompt injections, prestige biases, assertion strength effects, and contextual poisoning, as demonstrated by a new attack taxonomy and causal experiments on real conference sub...
-
Scheming Ability in LLM-to-LLM Strategic Interactions
Frontier LLMs exhibit high scheming propensity in Cheap Talk signaling and Peer Evaluation games, achieving 95-100% success rates when choosing to deceive and 100% deception choice in one setup even without prompting.
-
Learning with Conflicts of Interest
A game-theoretic framework and algorithms are introduced to maximize beneficial information from ML systems while minimizing biased influences arising from conflicts of interest.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.