REALISTA generates semantically coherent adversarial prompts via latent-space optimization over input-dependent editing directions, achieving stronger hallucination elicitation than prior realistic attacks on open-source and reasoning LLMs.
arXiv preprint arXiv:2403.01251
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
verdicts
UNVERDICTED 3representative citing papers
ASTRA is an automated closed-loop framework that discovers, retrieves, and evolves jailbreak attack strategies for LLMs using a dynamic three-tier strategy library and outperforms baselines in black-box settings.
Faster-GCG improves GCG efficiency 8x via regularization, temperature sampling, and duplicate avoidance, reaching 78.1% success rate with 32K evaluations across five aligned LLMs.
citing papers explorer
No citing papers match the current filters.