Longformer uses local windowed attention plus task-specific global attention to achieve linear scaling and state-of-the-art results on long-document language modeling, QA, and summarization after pretraining.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
Task-aligned supervised geometric stability predicts linear steerability with high accuracy while unsupervised stability detects representational drift earlier and with lower false alarms than CKA or Procrustes.
citing papers explorer
-
Longformer: The Long-Document Transformer
Longformer uses local windowed attention plus task-specific global attention to achieve linear scaling and state-of-the-art results on long-document language modeling, QA, and summarization after pretraining.
-
The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability
Task-aligned supervised geometric stability predicts linear steerability with high accuracy while unsupervised stability detects representational drift earlier and with lower false alarms than CKA or Procrustes.