Behavioral geometry of model populations enables high-accuracy jailbreak susceptibility prediction and defense transfer with 98% fewer evaluations.
Available: https://arxiv.org/abs/2402.04436
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Gini MDS replaces Euclidean distance in multidimensional scaling with a rank-and-value-based pseudo-distance controlled by a hyperparameter, claimed to yield more robust embeddings on noisy data than standard MDS.
citing papers explorer
-
Jailbreak susceptibility prediction and mitigation via the behavioral geometry of models
Behavioral geometry of model populations enables high-accuracy jailbreak susceptibility prediction and defense transfer with 98% fewer evaluations.
-
Optimizing Multidimensional Scaling in Gini Metric Spaces
Gini MDS replaces Euclidean distance in multidimensional scaling with a rank-and-value-based pseudo-distance controlled by a hyperparameter, claimed to yield more robust embeddings on noisy data than standard MDS.