pith. sign in

arxiv: 2601.14340 · v2 · pith:Y7QOTFDQnew · submitted 2026-01-20 · 💻 cs.CR · cs.LG

Turn-Based Structural Triggers: Prompt-Free Backdoors in Multi-Turn LLMs

classification 💻 cs.CR cs.LG
keywords dialoguemulti-turnstructuraltriggeruseracrossattackbackdoor
0
0 comments X
read the original abstract

Large Language Models (LLMs) are widely integrated into interactive systems such as dialogue agents and task-oriented assistants. This growing ecosystem also raises supply-chain risks, where adversaries can distribute poisoned models that degrade downstream reliability and user trust. Existing backdoor attacks and defenses are largely prompt-centric, focusing on user-visible triggers while overlooking structural signals in multi-turn conversations. We propose Turn-based Structural Trigger (TST), a backdoor attack that activates from dialogue structure, using the turn index as the trigger and remaining independent of user inputs. This creates a structure-conditioned reliability risk: a backdoored model can pass prompt-centric checks and standard utility evaluations, yet execute attacker-specified behaviors at selected dialogue positions without any trigger in the user input. Across four open-source LLM families, TST achieves a 99.52% average ASR while largely preserving non-triggered utility, and remains effective across unseen dialogue datasets and representative defenses. These results reveal dialogue structure as an overlooked attack surface and motivate structure-aware multi-turn auditing beyond prompt inspection.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.