pith. sign in

arXiv preprint arXiv:2203.16634 , year=

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

citation-role summary

background 2 method 1

citation-polarity summary

verdicts

UNVERDICTED 10

representative citing papers

Dual Triangle Attention: Effective Bidirectional Attention Without Positional Embeddings

q-bio.QM · 2026-04-09 · unverdicted · novelty 7.0

Dual Triangle Attention achieves effective bidirectional attention with built-in positional inductive bias via dual triangular masks, outperforming standard bidirectional attention on position-sensitive tasks and showing strong masked language modeling results with or without positional embeddings.

Group Representational Position Encoding

cs.LG · 2025-12-08 · unverdicted · novelty 7.0

GRAPE unifies RoPE and ALiBi as special cases of group actions on positions, providing a principled design space for positional encodings via SO(d) rotations and GL unipotent transformations.

Massive Activations in Large Language Models

cs.CL · 2024-02-27 · unverdicted · novelty 7.0

Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.

Flamingo: a Visual Language Model for Few-Shot Learning

cs.CV · 2022-04-29 · unverdicted · novelty 7.0

Flamingo models reach new state-of-the-art few-shot results on image and video tasks by bridging frozen vision and language models with cross-attention layers trained on interleaved web-scale data.

Graphical einops: bridging tensor networks and computation graphs

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

Introduces a graphical calculus with nested graded tubes bridging tensor networks and computation graphs for einops, turning equivariance proofs into diagrammatic derivations and enabling efficient sparse attention via mask preprocessing.

Positional Encoding via Token-Aware Phase Attention

cs.CL · 2025-09-16 · unverdicted · novelty 6.0

TAPA adds a learnable phase function to attention to preserve long-range token interactions, enabling direct continual pretraining, length extrapolation, lower perplexity, and stronger retrieval than RoPE-style methods.

citing papers explorer

Showing 10 of 10 citing papers.