pith. sign in

Llama scope: Extracting millions of features from llama-3.1-8b with sparse autoencoders.arXiv preprint arXiv:2410.20526, 2024

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 15 2025 1

roles

background 1

polarities

background 1

clear filters

representative citing papers

How Language Models Process Negation

cs.CL · 2026-05-04 · unverdicted · novelty 7.0 · 2 refs

LLMs process negation using both attention-based suppression and constructive representation mechanisms (construction dominant), with late-layer attention shortcuts explaining poor accuracy on negation tasks.

Knowledge Vector of Logical Reasoning in Large Language Models

cs.CL · 2026-04-26 · unverdicted · novelty 6.0

Distinct linear knowledge vectors for deductive, inductive, and abductive reasoning in LLMs can be refined via complementary subspace constraints to improve performance through mutual knowledge sharing.

The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

cs.LG · 2026-04-13 · unverdicted · novelty 6.0

MEDS improves LLM RL performance by up to 4.13 pass@1 and 4.37 pass@128 points by dynamically penalizing rollouts matching prevalent historical error clusters identified via memory-stored representations and density clustering.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping cs.LG · 2026-04-13 · unverdicted · none · ref 37

    MEDS improves LLM RL performance by up to 4.13 pass@1 and 4.37 pass@128 points by dynamically penalizing rollouts matching prevalent historical error clusters identified via memory-stored representations and density clustering.