SAGE reframes adversarial scenario generation as multi-objective preference alignment, using hierarchical group-based optimization and test-time linear interpolation of two expert policies to enable steerable control over adversariality-realism trade-offs.
Linear mode connectivity and the lottery ticket hypothesis
2 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 2representative citing papers
Pruning small-magnitude weights from pre-trained LLMs causes monotonic irreversible performance degradation on difficult downstream tasks, supporting the Junk DNA Hypothesis that these weights hold essential knowledge.
citing papers explorer
-
Steerable Adversarial Scenario Generation through Test-Time Preference Alignment
SAGE reframes adversarial scenario generation as multi-objective preference alignment, using hierarchical group-based optimization and test-time linear interpolation of two expert policies to enable steerable control over adversariality-realism trade-offs.
-
Junk DNA Hypothesis: Pruning Small Pre-Trained Weights Irreversibly and Monotonically Impairs "Difficult" Downstream Tasks in LLMs
Pruning small-magnitude weights from pre-trained LLMs causes monotonic irreversible performance degradation on difficult downstream tasks, supporting the Junk DNA Hypothesis that these weights hold essential knowledge.