What Does BERT Look at? An Analysis of BERT ' s Attention

Clark, Kevin, Khandelwal, Urvashi, Levy, Omer, Manning, Christopher D · 2019

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

SLASH the Sink: Sharpening Structural Attention Inside LLMs

cs.AI · 2026-05-11 · unverdicted · novelty 6.0 · 3 refs

SLASH is a plug-and-play attention redistribution technique that counters attention sinks to enhance LLMs' intrinsic graph topology reconstruction without any training or fine-tuning.

Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

AdaLeZO uses a non-stationary multi-armed bandit to adaptively allocate perturbation budget across layers in zeroth-order optimization and applies inverse probability weighting to reduce variance while preserving unbiased gradients, delivering 1.7x-3.0x wall-clock speedup on LLaMA and OPT models.

citing papers explorer

Showing 2 of 2 citing papers.

SLASH the Sink: Sharpening Structural Attention Inside LLMs cs.AI · 2026-05-11 · unverdicted · none · ref 20 · 3 links
SLASH is a plug-and-play attention redistribution technique that counters attention sinks to enhance LLMs' intrinsic graph topology reconstruction without any training or fine-tuning.
Universally Empowering Zeroth-Order Optimization via Adaptive Layer-wise Sampling cs.LG · 2026-04-20 · unverdicted · none · ref 8
AdaLeZO uses a non-stationary multi-armed bandit to adaptively allocate perturbation budget across layers in zeroth-order optimization and applies inverse probability weighting to reduce variance while preserving unbiased gradients, delivering 1.7x-3.0x wall-clock speedup on LLaMA and OPT models.

What Does BERT Look at? An Analysis of BERT ' s Attention

fields

years

verdicts

representative citing papers

citing papers explorer