pith. sign in

You only cache once: Decoder-decoder architectures for language models

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

verdicts

UNVERDICTED 8

roles

background 1

polarities

background 1

representative citing papers

Block-Based Double Decoders

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

Block-based double decoders use doubly-causal block attention masks to combine decoder-only training efficiency with encoder-decoder inference efficiency, outperforming standard encoder-decoders in scaling experiments.

Q-Delta: Beyond Key-Value Associative State Evolution

cs.AI · 2026-06-07 · unverdicted · novelty 5.0

Q-Delta extends linear attention by introducing a query-conditioned delta rule that incorporates mixed key-query errors into recurrent state updates for improved stability and performance.

Gated Delta Networks: Improving Mamba2 with Delta Rule

cs.CL · 2024-12-09 · unverdicted · novelty 5.0

Gated DeltaNet integrates gating and delta rules into linear transformers, outperforming Mamba2 and DeltaNet on language modeling, reasoning, retrieval, and long-context tasks.

citing papers explorer

Showing 8 of 8 citing papers.