pith. sign in

What Does BERT Look at? An Analysis of BERT ' s Attention

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.AI 1 cs.LG 1

years

2026 2

verdicts

UNVERDICTED 2

representative citing papers

SLASH the Sink: Sharpening Structural Attention Inside LLMs

cs.AI · 2026-05-11 · unverdicted · novelty 6.0 · 3 refs

SLASH is a plug-and-play attention redistribution technique that counters attention sinks to enhance LLMs' intrinsic graph topology reconstruction without any training or fine-tuning.

Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

AdaLeZO uses a non-stationary multi-armed bandit to adaptively allocate perturbation budget across layers in zeroth-order optimization and applies inverse probability weighting to reduce variance while preserving unbiased gradients, delivering 1.7x-3.0x wall-clock speedup on LLaMA and OPT models.

citing papers explorer

Showing 2 of 2 citing papers.

  • SLASH the Sink: Sharpening Structural Attention Inside LLMs cs.AI · 2026-05-11 · unverdicted · none · ref 20 · 3 links

    SLASH is a plug-and-play attention redistribution technique that counters attention sinks to enhance LLMs' intrinsic graph topology reconstruction without any training or fine-tuning.

  • Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling cs.LG · 2026-04-20 · unverdicted · none · ref 8

    AdaLeZO uses a non-stationary multi-armed bandit to adaptively allocate perturbation budget across layers in zeroth-order optimization and applies inverse probability weighting to reduce variance while preserving unbiased gradients, delivering 1.7x-3.0x wall-clock speedup on LLaMA and OPT models.