StarCoder2-15B matches or beats CodeLlama-34B on code tasks despite being smaller, and StarCoder2-3B outperforms prior 15B models, with open weights and exact training data identifiers released.
The data provenance initiative: A large scale audit of dataset licensing & attribution in AI
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
Chinese open language models overtook U.S. models in summer 2025 and widened the gap, based on Hugging Face downloads, model derivatives, inference share, and performance data.
citing papers explorer
-
StarCoder 2 and The Stack v2: The Next Generation
StarCoder2-15B matches or beats CodeLlama-34B on code tasks despite being smaller, and StarCoder2-3B outperforms prior 15B models, with open weights and exact training data identifiers released.
-
The ATOM Report: Measuring the Open Language Model Ecosystem
Chinese open language models overtook U.S. models in summer 2025 and widened the gap, based on Hugging Face downloads, model derivatives, inference share, and performance data.