pith. sign in

Mixed citations

Title resolution pending

Mixed citation behavior. Most common role is background (60%).

22 Pith papers citing it
Background 60% of classified citations

citation-role summary

background 4 method 1

citation-polarity summary

polarities

background 3 support 2

clear filters

representative citing papers

CSULoRA: Closest Safe Update Low-Rank Adaptation

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

CSULoRA decomposes LoRA updates into fully aligned, partially aligned, and off-subspace components and solves a closed-form penalized minimum-change problem to preserve safe parts while attenuating unsafe directions.

Few-Shot Truly Benign DPO Attack for Jailbreaking LLMs

cs.CR · 2026-05-09 · unverdicted · novelty 6.0

A truly benign DPO attack using 10 harmless preference pairs jailbreaks frontier LLMs by suppressing refusal behavior, achieving up to 81.73% attack success rate on GPT-4.1-nano at low cost.

Continual Safety Alignment via Gradient-Based Sample Selection

cs.LG · 2026-04-19 · unverdicted · novelty 6.0

Gradient-based selection that drops high-gradient samples during continual fine-tuning preserves safety alignment in LLMs better than standard fine-tuning while keeping task performance competitive.

citing papers explorer

Showing 1 of 1 citing paper after filters.