pith. sign in

org/abs/2502.01042

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 4 2025 2

verdicts

UNVERDICTED 6

roles

background 1

polarities

background 1

representative citing papers

Graph-Regularized Sparse Autoencoders for LLM Safety Steering

cs.LG · 2025-12-07 · unverdicted · novelty 6.0

GSAE improves selective refusal on safety benchmarks by smoothing SAE directions over a co-activation graph and applying them via a two-gate controller, outperforming standard SAEs and baselines on Llama-3 and other models.

Self-Aligned Reward: Towards Effective and Efficient Reasoners

cs.LG · 2025-09-05 · unverdicted · novelty 5.0

Self-aligned reward uses relative perplexity differences to encourage concise, query-specific reasoning in LLMs, yielding 4% accuracy gains and 30% lower inference cost when added to PPO or GRPO.

citing papers explorer

Showing 6 of 6 citing papers.