A 72GB Tibetan corpus enables continual pre-training of Qwen2.5-7B and a 50B-A10B MoE model, with new benchmarks showing outperformance over prior Tibetan models.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
From Curated Data to Scalable Models: Continual Pre-training of Dense and MoE Large Language Models for Tibetan
A 72GB Tibetan corpus enables continual pre-training of Qwen2.5-7B and a 50B-A10B MoE model, with new benchmarks showing outperformance over prior Tibetan models.