pith. sign in

hub Mixed citations

Jailbreak and guard aligned language models with only few in-context demonstrations

Mixed citation behavior. Most common role is background (60%).

23 Pith papers citing it
Background 60% of classified citations

hub tools

citation-role summary

background 3 method 2

citation-polarity summary

clear filters

representative citing papers

ToxiREX: A Dataset on Toxic REasoning in ConteXt

cs.CL · 2026-06-26 · unverdicted · novelty 6.0

ToxiREX is a new dataset of 128k Reddit comments in six languages with hierarchical annotations for implicit toxicity in conversational context based on an existing reasoning schema.

Secure LLM Fine-Tuning via Safety-Aware Probing

cs.LG · 2025-05-22 · unverdicted · novelty 6.0

SAP locates safety-correlated directions via contrastive signals and perturbs hidden-state propagation with a lightweight probe to preserve safety while fine-tuning LLMs for task performance.

citing papers explorer

Showing 1 of 1 citing paper after filters.