MSA is an end-to-end trainable memory model using sparse attention and document-wise RoPE that scales to 100M tokens with linear complexity and less than 9% degradation.
Qwen2.5 technical report, 2025
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
In linear regression, LoRA can achieve lower excess risk than full fine-tuning when the pretraining-downstream difference is low-rank, and small LoRA ranks can improve generalization by acting as regularization.
GoodServe proposes a predict-and-rectify routing system for agentic LLM inferences on heterogeneous GPUs that improves goodput by up to 27.4%.
citing papers explorer
-
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens
MSA is an end-to-end trainable memory model using sparse attention and document-wise RoPE that scales to 100M tokens with linear complexity and less than 9% degradation.
-
LoRA vs. Full Fine-Tuning: A Theoretical Perspective
In linear regression, LoRA can achieve lower excess risk than full fine-tuning when the pretraining-downstream difference is low-rank, and small LoRA ranks can improve generalization by acting as regularization.
-
GoodServe: Towards High-Goodput Serving of Agentic LLM Inferences over Heterogeneous Resources
GoodServe proposes a predict-and-rectify routing system for agentic LLM inferences on heterogeneous GPUs that improves goodput by up to 27.4%.