pith. sign in

hub

Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , shorttitle =

19 Pith papers cite this work. Polarity classification is still indexing.

19 Pith papers citing it

hub tools

citation-role summary

background 2 method 1

citation-polarity summary

representative citing papers

Interpretability Can Be Actionable

cs.LG · 2026-05-11 · conditional · novelty 6.0

Interpretability research should be judged by actionability—the degree to which its insights support concrete decisions and interventions—rather than explanatory power alone.

Flag Varieties: A Geometric Framework for Deep Network Alignment

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Alignment in deep networks is governed by flag varieties, with subspace intersection dimension as the unique reparameterization-invariant observable, explaining regularization and activation effects from first principles.

Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training

cs.AI · 2025-09-30 · unverdicted · novelty 6.0

Post-training on reasoning tasks sparks the emergence of specialized attention heads that enable structured computation, with SFT adding stable heads while GRPO uses dynamic activation and pruning tied to reward signals, and controllable think models relying on compensatory heads instead of specific

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

cs.CL · 2023-10-03 · conditional · novelty 6.0

FastGen adaptively compresses LLM KV caches via lightweight attention profiling: evicting long-range contexts on local heads, non-special tokens on special-token heads, and retaining full caches on broad-attention heads, yielding substantial memory savings with negligible quality loss.

EviRank: Evidence-Based Confidence Estimation for LLM-Based Ranking

cs.IR · 2026-06-03 · unverdicted · novelty 5.0

EviRank extracts three evidences from a single LLM forward pass, aggregates them with reliable opinion pooling and position-aware calibration, then uses the result to optimize rankings, claiming SOTA on recommendation and uncertainty quantification across three datasets.

Fast & Faithful Function Vectors

cs.CL · 2026-06-03 · unverdicted · novelty 4.0

LRP-based attention head selection and distributed application improve the efficiency and accuracy of function vectors for steering LLMs compared to prior choices.

Multilingual Vision-Language Models, A Survey

cs.CL · 2025-09-26 · accept · novelty 3.0

The survey identifies a key tension in multilingual vision-language models between language neutrality via contrastive learning and cultural awareness via diverse data, with most benchmarks relying on translation-based evaluation.

citing papers explorer

Showing 19 of 19 citing papers.