Industrial-scale LLMs require over 150B tokens for long-context continual pre-training to reach intrinsic saturation, with perplexity and retrieval-head attention providing stronger signals than needle-in-a-haystack tests.
In Findings of the Association for Computational Linguistics: NAACL 2025 , pages 1496–1524
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Technical report announcing Ling-2.6 and Ring-2.6 models with hybrid linear attention, evolutionary CoT, and KPop RL for efficient agentic intelligence at scale.
citing papers explorer
-
Revealing the Learning Dynamics of Long-Context Continual Pre-training
Industrial-scale LLMs require over 150B tokens for long-context continual pre-training to reach intrinsic saturation, with perplexity and retrieval-head attention providing stronger signals than needle-in-a-haystack tests.
-
Ling and Ring 2.6 Technical Report: Efficient and Instant Agentic Intelligence at Trillion-Parameter Scale
Technical report announcing Ling-2.6 and Ring-2.6 models with hybrid linear attention, evolutionary CoT, and KPop RL for efficient agentic intelligence at scale.