Benign-task experience in self-evolving agents degrades safety in high-risk scenarios by reinforcing execution over refusal, while mixed benign-harmful experience creates a safety-utility trade-off via over-refusal.
Open the Cabinet and break the Window,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
On Safety Risks in Experience-Driven Self-Evolving Agents
Benign-task experience in self-evolving agents degrades safety in high-risk scenarios by reinforcing execution over refusal, while mixed benign-harmful experience creates a safety-utility trade-off via over-refusal.