PCSA is the first persona-based client simulation attack that exposes LLMs' vulnerabilities in counseling by generating natural dialogues where models give bad advice, reinforce delusions, and encourage risky actions.
InProceedings of the 2025 Conference on Empiri- cal Methods in Natural Language Processing, pages 24278–24306, Suzhou, China
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Do No Harm: Exposing Hidden Vulnerabilities of LLMs via Persona-based Client Simulation Attack in Psychological Counseling
PCSA is the first persona-based client simulation attack that exposes LLMs' vulnerabilities in counseling by generating natural dialogues where models give bad advice, reinforce delusions, and encourage risky actions.