Multi-generation sampling from LLMs uncovers more jailbreak behaviors than single generations, with the largest gains from one to moderate sample counts and diminishing returns thereafter.
17 In: Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, pp
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
An Empirical Study of Multi-Generation Sampling for Jailbreak Detection in Large Language Models
Multi-generation sampling from LLMs uncovers more jailbreak behaviors than single generations, with the largest gains from one to moderate sample counts and diminishing returns thereafter.