STARE uses step-wise RL to attack multimodal models, achieving 68% higher attack success rate while revealing that adversarial optimization concentrates conceptual toxicity early and detail toxicity late in the generation trajectory.
Title resolution pending
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
MERIT enables decentralized instruction tuning via conflict-aware PCA splitting and parameter-space merging, raising average benchmark scores above joint training on multimodal and text mixtures.
Spectral Tempering derives an adaptive scaling factor γ(k) from the embedding eigenspectrum via local SNR analysis and knee-point normalization to achieve near-optimal compression without training or validation.
Task prompt vectors, formed by subtracting random initialization from tuned soft prompts, support low-resource initialization and arithmetic combination across tasks on 12 NLU datasets while remaining independent of initialization seed on two model architectures.
FoeGlass is a black-box red-teaming method that leverages LLM in-context learning with diversity-based prompting to generate adversarial audio samples, raising false negative rates of ADD models by up to 94% over baselines.
HippoRAG 2 improves on standard RAG and prior HippoRAG by adding deeper passage integration and more effective LLM use in Personalized PageRank, delivering superior performance on factual, sense-making, and associative memory tasks including a 7% gain in associative memory over state-of-the-art.
NSHA improves LLM handling of hierarchical instruction conflicts by combining solver-guided constraint satisfaction at inference with distillation of those decisions into model parameters at training.
PaliGemma is an open 3B VLM based on SigLIP and Gemma that achieves strong performance on nearly 40 diverse open-world tasks including benchmarks, remote-sensing, and segmentation.
Peer-preservation in LLMs requires architectural mitigations such as identity anonymization rather than model selection to maintain reliability in multi-agent systems for democratic discourse evaluation.
citing papers explorer
-
STARE: Step-wise Temporal Alignment and Red-teaming Engine for Multi-modal Toxicity Attack
STARE uses step-wise RL to attack multimodal models, achieving 68% higher attack success rate while revealing that adversarial optimization concentrates conceptual toxicity early and detail toxicity late in the generation trajectory.
-
Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging
MERIT enables decentralized instruction tuning via conflict-aware PCA splitting and parameter-space merging, raising average benchmark scores above joint training on multimodal and text mixtures.
-
Spectral Tempering for Embedding Compression in Dense Passage Retrieval
Spectral Tempering derives an adaptive scaling factor γ(k) from the embedding eigenspectrum via local SNR analysis and knee-point normalization to achieve near-optimal compression without training or validation.
-
Task Prompt Vectors: Effective Initialization through Multi-Task Soft-Prompt Transfer
Task prompt vectors, formed by subtracting random initialization from tuned soft prompts, support low-resource initialization and arithmetic combination across tasks on 12 NLU datasets while remaining independent of initialization seed on two model architectures.
-
FoeGlass: Simple In-Context Learning Is Enough for Red Teaming Audio Deepfake Detectors
FoeGlass is a black-box red-teaming method that leverages LLM in-context learning with diversity-based prompting to generate adversarial audio samples, raising false negative rates of ADD models by up to 94% over baselines.
-
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models
HippoRAG 2 improves on standard RAG and prior HippoRAG by adding deeper passage integration and more effective LLM use in Personalized PageRank, delivering superior performance on factual, sense-making, and associative memory tasks including a 7% gain in associative memory over state-of-the-art.
-
Hierarchical Alignment: Enforcing Hierarchical Instruction-Following in LLMs through Logical Consistency
NSHA improves LLM handling of hierarchical instruction conflicts by combining solver-guided constraint satisfaction at inference with distillation of those decisions into model parameters at training.
-
PaliGemma: A versatile 3B VLM for transfer
PaliGemma is an open 3B VLM based on SigLIP and Gemma that achieves strong performance on nearly 40 diverse open-world tasks including benchmarks, remote-sensing, and segmentation.
-
From Safety Risk to Design Principle: Peer-Preservation in Multi-Agent LLM Systems and Its Implications for Orchestrated Democratic Discourse Analysis
Peer-preservation in LLMs requires architectural mitigations such as identity anonymization rather than model selection to maintain reliability in multi-agent systems for democratic discourse evaluation.
- Distributional Open-Ended Evaluation of LLM Cultural Value Alignment Based on Value Codebook