pith. sign in

Attention is off by one, 2023

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

fields

cs.CL 2

years

2024 1 2023 1

roles

background 2

polarities

background 2

representative citing papers

Massive Activations in Large Language Models

cs.CL · 2024-02-27 · unverdicted · novelty 7.0

Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.

Efficient Streaming Language Models with Attention Sinks

cs.CL · 2023-09-29 · accept · novelty 6.0

StreamingLLM lets finite-window LLMs generalize to infinite-length sequences by retaining initial-token KV states as attention sinks, enabling stable streaming inference up to 4M tokens.

citing papers explorer

Showing 2 of 2 citing papers.

  • Massive Activations in Large Language Models cs.CL · 2024-02-27 · unverdicted · none · ref 135

    Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.

  • Efficient Streaming Language Models with Attention Sinks cs.CL · 2023-09-29 · accept · none · ref 35

    StreamingLLM lets finite-window LLMs generalize to infinite-length sequences by retaining initial-token KV states as attention sinks, enabling stable streaming inference up to 4M tokens.